Nature - 2020 12 10

The international journal of science / 10 December 2020
Eyes of the world

50 national regulatory bodies from 31 European countries.
The European Medicines Agency (EMA), created in 1995,
sits at the centre of this network. All countries have their
are on medicines own medical regulators, but the EMA provides manufac-
turers with a single place for scientific evaluation of drug
regulators applications, if they want Europe-wide approval.

In other regions — Africa, for example — countries are
also inching towards an approach that allows for the
pooling of regulatory expertise. This is the purpose of the
The race to produce COVID-19 vaccines African Vaccine Regulatory Forum, set up in 2006.
is a chance to create a more harmonized And at the global level, in 2012, the member states of the
approvals process. World Health Organization (WHO) agreed to establish the
International Coalition of Medicines Regulatory Authori-
T
ties (ICMRA) to allow regulators to share information and
he roll out of COVID-19 vaccines is under way, agree on approaches. The ICMRA has 29 members, includ-
but without, it seems, much global coordina- ing regulators from China, Europe and the United States.
tion where timing is concerned. China, Russia Through it, members have been able to reach a consensus
and the United Arab Emirates began adminis- on the best animal models for testing COVID-19 vaccines,
tering vaccines before the conclusion of clinical the ideal clinical-trial end points and the complicated issue
trials. Last week, the United Kingdom issued emergency of continuing placebo-controlled trials after vaccine roll
approval for a vaccine developed by the US biopharmaceu- out begins. The coalition’s COVID-19 working group is
tical company Pfizer and BioNTech of Germany following Regulators now trying to harmonize the monitoring of vaccines once
positive results from phase III testing (see page 205). The US want to be they have been deployed, because faint signals of adverse
Food and Drug Administration (FDA) has needed longer to effects might be too weak to spot in any one country.
make its decision on the same vaccine. And the regulatory
able to talk And then there is the WHO itself. Low- and middle-in-
agencies of Australia, the European Union and Switzerland to each other come countries can now benefit from the work that goes
are taking longer still. in the same into its Emergency Use Listing (EUL) process. On 13 Novem-
This patchwork of different approvals processes, despite ber, the agency issued its first ever such vaccine listing for
COVID-19 being the one common enemy, has revived a
units and a new polio vaccine. Around the end of October, the WHO
long-standing question of how to enhance harmonization about the requested that both the FDA and the EMA assess the suit-
in vaccine regulation. Researchers reviewing the regulatory same end ability of COVID-19 vaccines for low- and middle-income
landscape found at least 50 pathways to various types of points.” countries as they consider whether to issue emergency
accelerated vaccine approval in a group of 24 countries authorizations. It is not clear whether the regulators will
(S. Simpson et al. npj Vaccines 5, 101; 2020). agree — but if either does, the WHO can draw on that anal-
Greater harmonization would bring many benefits. Drug ysis and issue its own EUL within days of the decision. That
companies could look forward to agreed definitions for would be collaboration indeed.
different types of approval, and would also benefit from These are all important and necessary efforts. The need
agreed guidelines for criteria that their vaccine candidates now is to go a step further and find a path through the many
would need to meet. If countries’ regulators were to ask different types of vaccine approval. Before the pandemic,
for broadly the same things, companies could cut the time the Coalition for Epidemic Preparedness, a global group of
needed to prepare their drug applications. Companies, for funding agencies, companies and non-governmental organ-
their part, would need to allow — or help to create — a secure izations, set up a working group to map out obstacles to bet-
way for regulators to share data, which they are often not ter regulatory alignment, in anticipation of a new infectious
permitted to do at present. disease. This process confirmed how regulatory agencies
By assessing the same data, regulators could more easily differ on issues such as the use of genetic modification in
compare their findings and analyses with those of others, vaccine development, trials in pregnant women, and even
and their decisions would not only be more robust, but vial labelling. But it also meant that inconsistencies were
also be seen to be so. That, in turn, would shore up public already mapped out and under discussion when the pan-
confidence in a world in which vaccine hesitancy is rising demic struck. COVID-19 has intensified these discussions.
and in which many citizens already have the means to com- The next step will not be easy. Regulators want to be able
pare regulatory verdicts. This would be an evolutionary to exchange data. Their experiences during the pandemic
shift, not a revolutionary one, because in recent years — and have convinced many that they are moving towards a point
particularly after the Ebola crisis — regulators have made at which this will be possible. They want to be able to talk to
unprecedented efforts to discuss, coordinate and begin each other in the same units and about the same end points;
to harmonize some of their processes. and to make decisions based on the same data.
The FDA, which was set up in 1906, is the world’s old- Ultimately, each country must make its own decisions
est national medicines regulator. But the world has been about what’s best. But the goal of a harmonized regulatory
moving towards greater regulatory coordination for some dossier for vaccines, conforming to an agreed set of interna-
time. Europe’s regulatory system comprises a network of tional regulatory requirements, would be transformative.
Nature | Vol 588 | 10 December 2020 | 195

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Editorials
Accounting for sex

because women had not been properly represented among
trial participants when the drugs were first evaluated.
Although sex and gender analysis is improving in drugs
and gender makes trials, it remains a work in progress in many fields, Londa
Schiebinger, a science historian at Stanford University in
science better California, told Nature (see page 209). Researchers have
been highlighting the harms caused by failing to account
for sex and gender for decades, but it wasn’t until after the
turn of the millennium that funding bodies really started
The European Commission is set to insist on to address the problem. The Canadian Institutes of Health
steps that will make research design more Research began to request that analyses of sex and gen-
inclusive. der be included in grant applications in 2010, and the US
National Institutes of Health followed suit in 2016.
A
The European Commission began asking grant recipi-
t the end of last month, the European Com- ents to include sex and gender analysis in their research
mission announced that its grant recipients design in 2013, a request which, by 2020, covered around
will be required to incorporate sex and gender one-third of research fields. But according to later eval-
analyses into the design of research studies. uation reports, fewer researchers than expected imple-
The policy will affect researchers applying mented this request.
for grants that are part of the commission’s seven-year, An analysis of researchers funded by the Canadian Insti-
�85-billion (US$100-billion) Horizon Europe programme, tutes of Health Research, published in 2014, revealed that
which is due to begin next year. Researchers some had pushed back when asked to consider sex and
The funding is still awaiting sign-off from the European have been gender3. And both this analysis and the European Com-
Union’s 27 member states. But if all goes to plan, the commission’s evaluation highlighted that some grant recipi-
mission will be the largest funder to require sex and gender
highlighting ents used sex (which refers to biological characteristics)
analyses — along with analyses of other aspects of inclu- the harms interchangeably with gender (which is a social construct
sion, also known as intersectionality — in research design. caused by and is not necessarily aligned with a person’s sex). To help
Such analyses could include disaggregating data by sex researchers to better appreciate the value of sex and gen-
when examining cells, or considering how a technology
failing to der analysis, the commission’s expert advisory group —
might perpetuate gender stereotypes. acount for which Schiebinger chairs — has published 15 case studies
It’s a significant achievement. Science will be strength- sex and as examples of good practice (go.nature.com/33vxcxz).
ened by researchers incorporating analyses of sex and gender for Another positive action could be for research teams to
gender into their work at every stage — from study design include appropriate specialists to advise on, participate in,
to gathering data, analysing those data and drawing decades.” or lead the design of more-inclusive research. Groups could
conclusions. include researchers from the social or health sciences — the
The European Commission is not the first funding Canadian Institutes of Health Research analysis revealed
agency to make such changes. And this isn’t the first time that health- and social-science researchers are more likely
it has requested that studies account for sex and gender. to include sex and gender analyses in project design than
But in Horizon Europe, the requirement becomes a man- are researchers in the biomedical sciences.
date, and is expected to extend, by default, to most grant Ultimately, inclusive research design cannot be the
recipients. Exceptions will be made only for those working sole responsibility of funders. Some journals — includ-
on topics for which the commission thinks such studies ing Nature — are requesting that authors include sex and
would not be relevant, such as in pure mathematics. gender analyses, when appropriate. Universities and
Science and scientists have a troubled history of failing research supervisors also need to incorporate inclusive
to account for sex and gender when designing research. For design into the research methodology training they pro-
decades, crash test dummies were based on male bodies. vide to students.
Even though smaller models are now used to represent The European Commission is rightly adding its consid-
women, they fail to account for some other typical differ- erable voice to the effort to ensure that science is designed
ences, such as neck strength1. The inclusion of sex and gen- and carried out in a more inclusive way. But to change prac-
der analyses can also be revelatory. Sea turtles in Australia’s tices that have existed for centuries, more researchers —
Great Barrier Reef are being born mostly female because especially research leaders — need to accept where they
of warming temperatures — a discovery that was made have been going wrong, and how research and individuals
when researchers were able to analyse male and female have suffered as a result. The foundations are being laid for
populations2. better science, and the more hands join in this important
In some cases, the results of not accounting for sex and effort, the better.
gender have been catastrophic. Between 1997 and 2001, ten
prescription drugs were withdrawn from use in the United 1. Linder, A. & Svedberg, W. Accid. Anal. Prev. 127, 156–162 (2019).
2. Jensen, M. P. et al. Curr. Biol. 28, 154–159 (2018).
States, eight of which had been found to be more danger- 3. Johnson, J. , Sharman, Z., Vissandjée, B. & Stewart, D. E. PLoS ONE 9,
ous for women than men. This had been missed, in part, e99900 (2014).
196 | Nature | Vol 588 | 10 December 2020

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
A personal take on science and society
World view
By Ulrich Dirnagl
Institutions can retool for

more-rigorous research
Big moves to rebuild the scientific engagement. A QUEST good-evaluation-practice officer
infrastructure are possible. has sat as an independent assessor on hiring commissions
for 10 of the past 29 hiring calls.
We made
F
We tried to craft a system designed for its own improve-
ive years ago, I was part of a small group of ‘activ- sure that we ment. For example, we have developed an anonymous
ists’ who convinced the Berlin Institute of Health were viewed online tool through which researchers have reported
(BIH), where I work, to try out a set of reforms
as a resource, hundreds of errors and worrying incidents (U. Dirnagl
intended to improve the trustworthiness, use- et al. PLoS Biol. 14, e2000705; 2016). This has allowed us
fulness and ethics of research. Things grew from not a policing to learn from errors — for example, a technician realized
there: three years ago, with the help of government grants unit.” that ambiguous labelling of cell-culture media by a man-
and some nudging by a retired local politician, we secured ufacturer had spoiled her experiment. Her swift report-
€2.5 million (US$2.9 million) per year for efforts to build up ing prevented others from making the same mistake. The
incentives and technologies that increase rigour. company changed the labels on its flasks and alerted other
We were inspired by initiatives at other universities, such customers. After we saw many errors stemming from the
as the reforms that Frank Miedema introduced during his use of pipettes outside the calibrated range, we set up
deanship at the University Medical Center Utrecht in the ‘pipetting exercises’ and saw the rate of these errors fall.
Netherlands. But when the QUEST Center (QUEST stands Three years in, we’re seeing more papers published
for Quality, Ethics, Open Science and Translation) launched open access and with open data. We’re also seeing greater
at the BIH, there was no precedent or blueprint for a pro- participation in educational activities and in intramural
gramme of this scale. programmes using responsible selection criteria, such as
From the beginning, we presumed that researchers and engagement with patient communities, reuse of data or
clinician–scientists are skilled professionals who want to preregistration. Of course, funders and journals are also
‘do the right thing’ but are also under pressure to accrue pulling in the same direction, so it is impossible to know to
publications to advance their careers. Doing quality which changes are due to the efforts of QUEST.
research takes time and humility, so unless we changed However, we still have a long way to go. Our benchmark-
the system, researchers who pursued quality-enhancing ing study found that, within 2 years of completion, only 40%
practices could have found themselves at a disadvantage. of studies sponsored by the Charité had reported results
What was the solution? We made sure that we were viewed (S. Wieschowski et al. J. Clin. Epidemiol. 115, 37–45; 2019).
as a resource, not a policing unit. We selected interventions Furthermore, 5 years after completion, more than 30% of
that we thought we could implement. Alongside introduc- results remained unavailable. But we hope to correct this.
ing courses on experimental design and methods aimed at We use counselling and web tools to offer guidance on how
reducing bias, we focused on practices to increase the trans- to publish null, inconclusive, negative and other ‘nonstand-
parency of research. One push was for the use of electronic ard’ results, and award monetary research bonuses for
laboratory notebooks (ELNs), which improve research doc- the publication of negative results or replication studies.
umentation and make collaboration easier. We made sure Most faculty members welcome our activities, and we are
that QUEST, and not individual labs, covered the licence working to expand student and researcher engagement.
fees and provided plenty of support. So far, nearly 2,000 For example, using funding from the biomedical
of our 7,000 researchers, PhD students and technicians are research charity Wellcome in London, we have established
registered ELN users; my guess is that about half of these fellowships for mid-career researchers who collaborate
have an ELN as their primary lab notebook. For many, ELNs to develop and track initiatives for improving science in
are a necessary first step towards systematically managing their own research groups. Our experience shows that
their research data, which QUEST also supports. structured programmes can be rolled out by any academic
We simultaneously adjusted the incentive and reward institution that is willing and able to improve its research in
system. When hiring professors and awarding institutional a systematic fashion. The budget of QUEST is less than 1%
funds, we now consider how thoroughly and quickly people of our institution’s state funding for research and teaching,
share their results. Those who make original data availa- not including monies from third-party funders.
ble in publications are rewarded with a financial bonus Ulrich Dirnagl directs QUEST started from scratch. But many institutions
that can be spent on research. QUEST works with the BIH the QUEST Center at already promote activities such as open science, data
THOMAS RAFALZYK
and the leadership of the Charité, Berlin’s university med- the Berlin Institute management and responsible research. If they align their
ical centre, to ensure that evaluation criteria encompass of Health. efforts, they can expand them and incorporate scientific
responsible research practices, including publication e-mail: ulrich. ideals into incentive structures. The quality of science and
of null results, provision of open data and community dirnagl@charite.de the culture of the workplace will be better off.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
The world this week
News in brief
COVID-19 VACCINES
ARE NOT BEING
SHARED EQUALLY
Vaccine developers who have
already reported promising
phase III trial results against
COVID-19 estimate that,
between them, they can make
sufficient doses for more
than one-third of the world’s
MILKY WAY MAP
population by the end of 2021.
But many people in low- REVEALS ONE BILLION
income countries might have STARS IN MOTION
to wait until 2023 or 2024 for
vaccination. Five rich countries A huge data update from the
and the European Union have Gaia space observatory — which
pre-ordered about half of is tracking more than one billion
expected capacity for 2021, stars in the Galaxy — offers a
according to data from Airfinity, picture of what Earth’s night sky
a life-sciences market analytics will look like for 1.6 million years
firm in London. Canada leads to come.
on vaccine deals per capita,
with nearly nine doses per
The European Space Agency
probe lifted off in late 2013, and Arecibo telescope collapses
person. Most low- and middle-
income countries will rely on
began observing stars in July
2014 from a perch 1.5 million
in gut-wrenching display
contributions from COVAX, a kilometres from Earth. Gaia The iconic radio telescope at the Arecibo Observatory
joint fund for equitable vaccine continuously scans the sky as in Puerto Rico has collapsed, leaving astronomers and
distribution. it slowly spins, and it has now
the Puerto Rican scientific community to mourn its
measured the positions of the
BEST AND WORST SUPPLIED same stars multiple times. This demise.
Canada has pre-ordered almost 9 doses enables scientists to track stars’ Engineers had warned that the 900-tonne platform
of COVID-19 vaccines per person.
nearly imperceptible motions suspended above the telescope’s 305-metre-wide dish
Pre-ordered across the Galaxy year after
Potential for expansion in deal could fall at any moment, given that one of the main
year, and to triangulate their
positions using a technique
cables supporting it had snapped in early November.
Canada
called parallax. Last month, the US National Science Foundation,
United States
The latest update is based which owns the observatory, announced that it would
PHOTOGRAPHS L TO R: ESA/GAIA/DPAC (CC BY-SA 3.0 IGO); RICARDO ARDUENGO/AFP VIA GETTY
United Kingdom on around three years of data, shut down the telescope permanently, citing safety
Australia and includes a complete census
concerns over its instability, and damage too extensive
European Union of the Sun’s neighbourhood:
all but the faintest stars to repair.
SOURCE: DATA FROM AIRFINITY, UP TO 19 NOVEMBER/NATURE TABULATIONS;
Japan
within 100 parsecs (326 light The platform plummeted into the dish after
Vietnam years), totalling more than some cables failed just before 8 a.m. local time on
India 300,000 objects. The mission 1 December. No one was injured.
Israel has expanded its catalogue
Once the world’s largest single-dish radio telescope,
of stars by 15%, and its
Switzerland
measurements have become the Arecibo facility has been the site of many key
Indonesia more precise astronomical discoveries over the years, including
Brazil The data will underpin studies observations of the spinning stars known as pulsars
Latin America that range from the origins
(excl. Brazil) that led to the 1993 Nobel Prize in Physics.
and evolution of the Galaxy to
Egypt “Our hearts are heavy about this,” said Thomas
locating its dark matter.
Mexico Zurbuchen, NASA’s associate administrator for
China science, at a 1 December NASA advisory meeting.
COVAX It is unclear whether the dish will be demolished,
0 2 4 6 8 10 rebuilt or left in ruins.
Doses per person

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
The world this week
News in focus
EDWARD KINSMAN/SPL
A protein’s function is determined by its 3D shape.
‘IT WILL CHANGE EVERYTHING’:

AI MAKES GIGANTIC LEAP IN
SOLVING PROTEIN STRUCTURES
DeepMind’s program for determining the 3D shapes
of proteins stands to transform biology, say scientists.
By Ewen Callaway
A
results were announced on 30 November, at to understand the building blocks of cells and
the start of the conference — held virtually this aid more advanced drug discovery.
n artificial intelligence (AI) network year — that takes stock of the exercise. AlphaFold came top of the table at the
developed by Google AI offshoot “This is a big deal,” says John Moult, a compu- last CASP — in 2018, the first year that
DeepMind has made a gargantuan tational biologist at the University of Maryland London-based DeepMind participated. But,
leap in solving one of biology’s grand- in College Park, who co-founded CASP in 1994 this year, the outfit’s deep-learning net-
est challenges — determining a pro- to improve computational methods for accu- work was head-and-shoulders above other
tein’s 3D shape from its amino-acid sequence. rately predicting protein structures. “In some teams and, say scientists, performed so
DeepMind’s program, called AlphaFold, sense the problem is solved.” mind-bogglingly well that it could herald a
outperformed around 100 other teams The ability to accurately predict proteins’ revolution in biology.
in a biennial protein-structure prediction structures from their amino-acid sequences “It’s a game changer,” says Andrei Lupas, an
challenge called CASP, short for Critical would be a huge boon to life sciences and evolutionary biologist at the Max Planck Insti-
Assessment of Structure Prediction. The medicine. It would vastly accelerate efforts tute for Developmental Biology in Tübingen,

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News in focus
Germany, who assessed the performance of efforts. The event challenges teams to pre- on a regular basis, and teams have several
different teams in CASP. AlphaFold has helped dict the structures of proteins that have been weeks to submit their structure predictions.
him to find the structure of a protein that has solved using experimental methods, but for A team of independent scientists assesses the
vexed his laboratory for a decade. “This will which the structures are not public. predictions using metrics that gauge how simi-
change medicine. It will change research. It DeepMind’s 2018 performance at CASP13 lar a predicted protein is to the experimentally
will change bioengineering. It will change startled many scientists in the field, which determined structure. The assessors don’t
everything,” Lupas adds. has long been the bastion of small academic know who is making a prediction.
In some cases, AlphaFold’s structure pre- groups. But its approach was broadly similar AlphaFold’s predictions arrived under the
dictions were indistinguishable from those to those of other teams that were applying AI, name ‘group 427’, but the startling accuracy
determined using ‘gold standard’ experimen- says Jinbo Xu, a computational biologist at the of many of its entries made them stand out,
tal methods such as X-ray crystallography and, University of Chicago, Illinois. says Lupas. “I had guessed it was AlphaFold.
in recent years, cryo-electron microscopy The first iteration of AlphaFold applied the Most people had,” he says.
(cryo-EM). AlphaFold might not obviate the AI method known as deep learning to struc- Some predictions were better than others,
need for these laborious and expensive meth- tural and genetic data to predict the distance but nearly two-thirds were comparable in qual-
ods — yet — say scientists, but the AI will make between pairs of amino acids in a protein. ity to experimental structures. In some cases,
it possible to study living things in new ways. In a second step, which does not invoke AI, says Moult, it was not clear whether the dis-
AlphaFold uses this information to come up crepancy between AlphaFold’s predictions and
The structure problem the experimental results was a prediction error
Proteins are the building blocks of life, respon- “This is going to empower or an artefact of the experiment. AlphaFold
sible for most of what happens inside cells. also struggled to model individual structures
How a protein works and what it does is deter-
a new generation of in protein complexes.
mined by its 3D shape. Proteins tend to adopt molecular biologists to ask
their shape without help, guided only by the more advanced questions.” Faster structures
laws of physics. An AlphaFold prediction helped to determine
For decades, laboratory experiments have the structure of a bacterial protein that Lupas’s
been the main way to obtain good protein struc- with a ‘consensus’ model of what the pro- lab has been trying to crack for years. Lupas’s
tures. The first complete structures of proteins tein should look like, says John Jumper at team had previously collected raw X-ray diffrac-
were determined, starting in the 1950s, using DeepMind, who is leading the project. tion data, but transforming these patterns into
a technique in which X-ray beams are fired at The team tried to build on that approach a structure requires some information about
crystallized proteins and the diffracted light but eventually hit the wall. So it changed tack, the shape of the protein. Tricks for getting this
translated into a protein’s atomic coordinates. says Jumper, and developed an AI network that information, as well as other prediction tools,
X-ray crystallography has produced the lion’s incorporated additional information about had failed. “The model from group 427 gave us
share of protein structures. But, over the past the physical and geometric constraints that our structure in half an hour,” Lupas says.
decade, cryo-EM has become the favoured tool determine how a protein folds. The team also Demis Hassabis, DeepMind’s co-founder and
of many structural-biology labs. set it a more difficult task: instead of predict- chief executive, says that the company plans to
Scientists have long wondered how a pro- ing relationships between amino acids, the make AlphaFold useful to other scientists. (It
tein’s constituent amino acids map out the network predicts the final structure of a target previously published enough details about the
twists and folds of its eventual shape. Early protein sequence. “It’s a more complex system first version of AlphaFold for other researchers
attempts to use computers to predict protein by quite a bit,” Jumper says. to replicate the approach.) It can take AlphaFold
structures in the 1980s and 1990s performed days to come up with a predicted structure,
poorly. Lofty claims for methods in published Startling accuracy which includes estimates on the reliability of
papers tended to disintegrate when other CASP takes place over several months. Tar- different regions of the protein. “We’re just
scientists applied them to other proteins. get proteins or portions of proteins called starting to understand what biologists would
Moult started CASP to bring rigour to these domains — about 100 in total — are released want,” adds Hassabis.
In early 2020, the company released pre-
STRUCTURE SOLVER dictions for a handful of SARS-CoV-2 protein
DeepMind’s AlphaFold 2 algorithm significantly outperformed other teams at the CASP14 structures that hadn’t been determined experi-
protein-folding contest — and its previous version’s performance at the last CASP.
mentally. DeepMind’s predictions for a protein
100 called Orf3a ended up being similar to one later
AlphaFold 2 determined through cryo-EM, says Stephen
90
A score above 90 Brohawn, a molecular neurobiologist at the
Global distance test (GDT_TS; average)
80 is considered roughly University of California, Berkeley, whose team

equivalent to the
70 released the structure in June. “What they have
experimentally
determined structure AlphaFold been able to do is very impressive,” he adds.
60
AlphaFold is unlikely to remove the need
50 for labs, such as Brohawn’s, that use experi-
mental methods to solve protein structures.
40
But it could mean that lower-quality experi-
30 mental data would be all that’s needed to get
a good structure. Some applications, such as
20
the evolutionary analysis of proteins, are set
SOURCE: DEEPMIND
10 to flourish. “This is going to empower a new

0
generation of molecular biologists to ask more
2006 2008 2010 2012 2014 2016 2018 2020 advanced questions,” says Lupas. “It’s going
Contest year to require more thinking and less pipetting.”

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
remain susceptible to asymptomatic infection
— and could transmit that infection to others
who remain vulnerable. “In the worst-case
scenario, you have people walking around
feeling fine, but shedding virus everywhere,”
says virologist Stephen Griffin at the Univer-
sity of Leeds, UK.
Pfizer has said that its scientists are looking
at ways to assess virus transmission in future
studies. For now, AstraZeneca and the Univer-
sity of Oxford might be able to provide the
first hints as to whether a vaccine can protect
against such transmission. Although they have
yet to publish complete results, their trial did
routinely test participants for SARS-CoV-2,
allowing investigators to track whether people
JEROME DELAY/AP/SHUTTERSTOCK
became infected without developing symp-

toms. Early indications are that the vaccine
might have reduced the frequency of such
infections, which would suggest that trans-
mission might also be reduced.
How long will vaccine-induced

Large-scale trials of COVID-19 vaccines have been conducted at an unprecedented speed. immunity last?
There is no quick way to determine how long
COVID VACCINES:
immunity to the SARS-CoV-2 virus will last, and
researchers will need to monitor this closely
WHAT SCIENTISTS
in the coming months and years.
There have been some reports that peo-
NOW WANT TO KNOW

ple who have had one bout of COVID-19 and
developed antibodies against it can experi-
ence falling antibody levels and even reinfec-
tion months later, but it is still unclear how
UK approves Pfizer–BioNTech vaccine. Researchers prevalent reinfection is. There are signs that
the immune system preserves a memory of
are watching how it and others will perform. coronavirus infection in the form of special-
ized memory cells that could kick into action
By Heidi Ledford, David Cyranoski & a request for emergency approval to the US rapidly if the virus is encountered again.
Richard Van Noorden
W
Food and Drug Administration. The trial has And vaccines, Altmann says, are deliberately
so far gathered data from only 170 cases of designed to provoke strong responses from
ith striking speed, the United COVID-19 across its control and intervention the immune system.
Kingdom has become the first arms, and real-world efficacy might be lower Still, it will be important for public-health
country to approve a COVID-19 than in a trial, but it is still an extraordinarily officials to monitor immunity — and to know
vaccine that has been tested in a promising result, says immunologist Danny when it begins to wane. One way to do that,
large clinical trial. On 2 Decem- Altmann at Imperial College London: “This is in addition to keeping track of infections
ber, UK regulators granted emergency-use brilliant news.” among people who have received the shots,
authorization to a vaccine from drug firms The approval is a historic moment. But is to assess their levels of antibodies and
Pfizer and BioNTech, just seven months after scientists still have many questions about how immune cells periodically. Tracking how
the start of clinical trials. Hospitals have this and other vaccines will perform as they’re these immune responses change could give
already administered the first doses; front- rolled out to millions of people. an early indication of when they are waning to
line health-care workers, care-home staff and worrisome levels, says Altmann. But the wide
residents are at the head of the queue. Do the vaccines prevent variation in people’s immune responses could
China and Russia have approved vaccines transmission of SARS-CoV-2? make it a challenge to understand the circum-
already, but without waiting for the immuni- In addition to the Pfizer vaccine, regulators stances in which a vaccine doesn’t work, and
zations to complete the final round of tests in are poring over data from a similar vaccine such studies will need to track many people.
people. Regulators in the United States and the made by Moderna of Cambridge, Massachu- “You need to have a good stab at some high-
European Union are expected to issue their setts, and a third produced by AstraZeneca of level population analysis to work out whether
decisions on the Pfizer vaccine in the coming Cambridge, UK, and the University of Oxford, you’re winning or losing,” says Altmann. “Oth-
weeks. UK. All three have been tested in large clinical erwise, you might be a government kidding
Tests on more than 43,000 people have trials, and have shown promise in preventing yourself in years’ time.”
shown that it is 95% effective at preventing disease symptoms.
disease when measured a week after partic- But none has demonstrated that it prevents How well do the vaccines work in
ipants are given their second dose, the New infection altogether, or reduces the spread of older people and other groups?
York City-based firm said in November when it the virus in a population. This leaves open the The major vaccine trials so far have enrolled
and BioNTech, in Mainz, Germany, submitted chance that those who are vaccinated could tens of thousands of people, but for each one,

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News in focus
conclusions about effectiveness are drawn genome, however, seems to be fairly stable on the market. As a result, people will be
from fewer than 200 people who have devel- so far. Most of the vaccines being developed, watching closely for as-yet unobserved signs
oped disease. As a result, it can be difficult to including the three that lead the pack, target of danger.
break up the data to look at efficacy in differ- a molecule called the spike protein, which Clinical trials vet vaccines rigorously for
ent groups — such as people who are obese or the virus needs to infect cells. And immune potential side effects with a combination of
elderly — without losing statistical power. More responses elicited by those vaccines will prob- self-reporting from participants and data col-
data are needed across demographics, says ably target multiple sites on that protein. lection by clinicians. Pfizer’s trials revealed
Michael Head, an infectious-disease researcher This gives researchers some reassurance that some recipients experienced pain at the
at the University of Southampton, UK. that the virus might not evolve ways to evade injection site, along with fever, fatigue, sore
There are early indications that the three immunity. But mass vaccination campaigns muscles and headaches — although these
leading vaccines protect people over 65. But will, for the first time, put enormous pressure symptoms are generally not serious.
researchers will probably need real-world on SARS-CoV-2 to adapt, and will select for any But after a vaccine is approved, whether
data from large numbers of vaccinated peo- strain of the virus that might be able to escape fully or only for emergency use, clinicians are
ple before they can get the demographic immune defences. “We’ve never seen a virus expected to continue reporting any adverse
granularity necessary to ensure that parts of like this under selective pressure,” says Griffin. reactions. Many countries have some kind of
the population aren’t left unprotected. “So we don’t know how it’s going to respond.” programme, such as the US Vaccine Adverse
There are no data yet on how the vaccines As a result, researchers will need to monitor Event Reporting System, that collects reports
fare in children and pregnant women. On samples of SARS-CoV-2 for signs of change, of serious symptoms after people receive
2 December, Moderna unveiled plans to test says Charlie Weller, head of vaccines at the bio- a vaccine. US doctors are legally bound to
its vaccine in adolescents. medical research charity Wellcome in London. report such symptoms. For COVID-19 drugs
“Robust surveillance with ongoing sampling and vaccines, the United Kingdom has set
How do the vaccines stack up and sequencing will be key,” she says. up a specialized Coronavirus Yellow Card
against each other? reporting site.
All three leading vaccines have probably How will scientists monitor for Such systems work, says Jerome Kim,
beaten the goal of achieving 50% efficacy, long-term safety concerns? director-general of the International Vaccine
and all seem to be safe, on the basis of the The Pfizer vaccine has completed only a few Institute in Seoul. “You still need strong sur-
clinical-trial data so far. But there might be months of the two-year clinical-trial period veillance. These rare events can be important,”
differences in how well they work. needed before it is approved to be sold freely he says.
The vaccines from Pfizer and Moderna rely
on RNA encased in a lipid particle that ferries
it into cells, where it helps to generate a viral
CAN JOE BIDEN MAKE

protein that stimulates the immune system.
AstraZeneca’s vaccine uses DNA that is shut-
GOOD ON HIS AMBITIOUS

tled into cells inside a harmless virus.
Early data suggest that the RNA approach
CLIMATE AGENDA?
might be more effective for preventing disease
symptoms developing. But there are subtle dif-
ferences in the immune responses provoked
by each approach, notes Griffin. Research-
ers might eventually find that one approach The US president-elect faces an uphill battle, but
works better than another in certain groups
of people, or that one is the best at limiting there are levers he can pull to curb global warming.
transmission.
By Jeff Tollefson
W
Differences in costs and logistics will also global warming is still a partisan issue on
shape which vaccine is best for which region. Capitol Hill, and “that is going to limit what
Shortly after the UK government announced hen Joe Biden won the US presi- Biden can accomplish”.
the authorization of the Pfizer vaccine, offi- dency last month, it seemed like Biden’s election comes at a crucial juncture.
cials acknowledged that getting the vaccine a huge opportunity to restore the President Donald Trump pulled the United
to residents in individual care homes would country’s position as a leader in States out of the Paris climate agreement last
be a challenge, because it needs to be stored the fight against climate change. month, but other players on the world stage,
at extremely low temperatures (−70 °C). The But whether he’ll be able to deliver on his from China to the European Union, are prepar-
other two vaccines do not need to be kept at aggressive climate agenda remains to be ing to present a new round of commitments
such low temperatures, and the AstraZeneca seen, especially because he will face a powerful at the United Nations climate conference in
immunization is likely to be the easiest and Republican opposition in Congress. Glasgow, UK, next year.
cheapest to store, says Head. Still, climate-policy experts say that there is Having the United States back on board
Comparisons between the effectiveness a lot the former senator and vice-president to will give an important boost to these negoti-
of the different vaccines are important and Barack Obama can do, including exerting his ations, says Jean-Pascal van Ypersele, a clima-
should be done, but until then, the path for- authority over federal agencies and leveraging tologist at the Catholic University of Louvain
ward is clear, says Altmann. “Grab any vaccine his experience working with both parties in the in Louvain-la-Neuve, Belgium, and former
that your government can buy,” he says. Senate to push legislation in Congress. vice-chair of the Intergovernmental Panel on
“This is really the first time that a US presi- Climate Change. “The stars are much better
Could the virus evolve to evade dent is leading with climate,” says Vicki Arroyo, aligned for a successful outcome in Glasgow
immunity given by vaccines? executive director of Georgetown University’s than they would have been if Trump had been
Some viruses, such as the influenza virus, Climate Center in Washington DC. That’s excit- re-elected.”
are notorious for mutating. The SARS-CoV-2 ing, she says, but suggests cautious optimism: Biden’s first opportunity to advance his

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
greenhouse-gas emissions directly through the
Environmental Protection Agency (EPA). Over
the past four years, Trump’s EPA has reversed
or weakened dozens of environmental regu-
lations, including a trio of Obama-era climate
policies targeting emissions from vehicles,
power plants and oil and gas facilities. Biden
is expected to move immediately to restore —
and strengthen — those efforts, but this means
starting again and crafting new rules.
“That is the killer, in terms of workload,” says
Betsy Southerland, who spent more than three
decades at the EPA before resigning in 2017
in protest against Trump. “The Biden admin-
istration is going to have to make a decision:
do they laboriously reverse each one of those
rules, or is there something more effective and
efficient they can do?”
In the case of the policy on fuel-efficiency
ALEX WONG/GETTY
standards for vehicles, the administration

might move forward with an entirely new rule.
The Trump administration rolled back stand-
ards put in place under Obama so that the car
Biden’s climate platform was the most aggressive put forth by a leading presidential candidate. industry has to boost average fuel efficiency
by only around 1.5% per year between 2022 and
agenda through Congress could come, as it got serious about sidestepping Congress with 2025, instead of Obama’s 5% per year. Rather
did for Obama, in the form of an economic his authority to battle climate change.) than reworking the rule, the Biden admin-
stimulus bill. With the US economy reeling Under Biden, the interior department, istration will probably move to develop an
from the pandemic, many analysts expect for instance, could hasten the processing entirely new set of regulations that look for-
this to be at the top of Biden’s agenda when of federal permits to build offshore wind ward another 10–15 years for longer-lasting
he enters office. His team has made climate a farms and other renewable-energy projects. impact, says David Doniger, strategic director
central feature of its economic plan and could And the Department of Energy could raise of the Climate and Clean Energy Program at
use a stimulus package to increase federal energy-efficiency standards for appliances. the Natural Resources Defense Council, an
investments in low-carbon energy and green “There’s no need for Biden to wait,” says Tim environmental group based in New York City.
infrastructure. Profeta, who leads Duke University’s Nicholas
Unlike Obama, Biden will probably next look Institute for Environmental Policy Solutions The road to Glasgow
for ways to advance smaller climate meas- in Durham, North Carolina. “There’s a lot the Getting an early start on implementing
ures through Congress rather than pushing president can do using his own authority, start- his climate agenda will be crucial as Biden
sweeping legislation, because of Republican ing from day one.” reintegrates the country into the Paris climate
opposition. One possibility would be bipar- Profeta co-chairs the Climate 21 Project, agreement.
tisan legislation that creates a carbon tax to an independent group of academics, policy The president-elect will need to develop
reduce US greenhouse-gas emissions — an specialists and former government officials a climate pledge and present it to the world
idea that has backing among many conserva- at next year’s conference in Glasgow, where
tives and business leaders who are concerned “There’s a lot the countries are expected to update their com-
about the climate. One proposal developed mitments for the first time since the agree-
by the Climate Leadership Council, a non-
president can do using ment was signed in 2015. Under Obama,
profit organization based in Washington DC, his own authority, the United States initially committed to cut
would levy a tax on carbon dioxide emissions, starting from day one.” greenhouse-gas emissions by at least 26%
starting with a modest US$40 per tonne and below 2005 levels by 2025. The challenge is
increasing over time, with the goal of cutting to make sure the new US pledge is both strong
US emissions in half by 2035. The proceeds that has crafted a blueprint for executive and credible, says Joseph Aldy, an economist
would be refunded to taxpayers. action across 11 federal offices and agencies at Harvard University in Cambridge, Massa-
Getting such legislation through the Senate to address global warming. The top-line rec- chusetts, who served as a White House climate
won’t be easy, but it’s not impossible, says Bob ommendation from the group is that the new adviser under Obama.
Inglis, who heads the Energy and Enterprise administration should establish a National “We have lost credibility on many fronts as
Initiative, a think tank advocating politically Climate Council led by an official who reports a result of Donald Trump,” says Aldy. If Biden
conservative environmental solutions at directly to the president. This person would wants to take a leadership role in the Paris pro-
George Mason University in Fairfax, Virginia. help to advance Biden’s climate agenda by cess and push other countries to do more, Aldy
“This is a major opportunity,” says Inglis. coordinating with various US agencies. “You says, the president-elect will need to convince
Regardless of what happens in Congress, need somebody in the West Wing who has the the global community that any regulations
many scientists and environmentalists expect president’s ear and who is focused on mak- or legislation he puts in place are going to be
that Biden will immediately use his executive ing climate action happen across the federal effective and won’t be easily reversed in four
authority to advance his climate agenda across government,” says Profeta. or eight years, when a new president is elected.
the full suite of federal agencies. (It wasn’t The president’s most powerful tool when “Our counterparts around the world will be
until Obama’s second term, in 2012, that he it comes to climate change is regulating looking very closely at what we are doing.”

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News in focus
is on the table,” says Koopmans.
Another member, Hung Nguyen, an environ
ment and food-safety researcher at the
International Livestock Research Institute in
Nairobi, will contribute his knowledge on how
pathogens spread in wet markets, similar to
the Huanan seafood market in Wuhan, which
many of the first people reported to have
COVID-19 had visited. Nguyen has investigated
how salmonella and other bacteria spread
through smallholder farms, slaughterhouses
and live-animal markets in his home country
of Vietnam and across southeast Asia.
MARKO KONIG/IMAGEBROKER/SHUTTERSTOCK
Also on the team is Peter Daszak, president of
the non-profit research organization Ecohealth
Alliance in New York City, who has spent more
than a decade studying coronaviruses. He has
worked closely with the Wuhan Institute of
Virology (WIV) to test bats for coronaviruses
with the potential to spill over into people.
“It is an honour to be part of this team,” says
Daszak. “There hasn’t been a pandemic on this
scale since the 1918 flu, and we’re still close
SARS-CoV-2 probably originated in bats, but how it passed to people is being investigated. enough to the origin to really find out more
details about where it has come from.”
THE SCIENTISTS
Another team member, Fabian Leendertz,
a veterinary researcher at the Robert Koch
INVESTIGATING THE
Institute in Berlin, will bring his expertise in
spillover events. In April 2014, Leendertz vis-
PANDEMIC’S ORIGINS
ited Meliandou village in Guinea, months after
a two-year-old died of Ebola — the first person
reported to be infected in West Africa.
Work by Leendertz, including interviews
The World Health Organization will draw on a diverse with locals and environmental sampling,
suggests that the outbreak started in bats that
team to examine a major mystery about SARS-CoV-2. lived in a hollow tree where the children used
to play. The tree was burnt down days before
By Smriti Mallapaty
A
SARS-CoV-2 was first identified — and expand his arrival and no Ebola virus was detected in
across China and beyond. nearby bats, which he says highlights the dif-
n epidemiologist who helped to tie The international group comes with a ficulties of finding an outbreak’s beginnings.
the 2012 outbreak of Middle East res- breadth of knowledge. Marion Koopmans is a Considerable time has passed since the
piratory syndrome (MERS) to camels; virologist specializing in molecular epidemiol- emergence of COVID-19, and many people have
a food-safety officer who studies how ogy at the Erasmus University Medical Centre only mild or no symptoms, which will make it
pathogens spread in markets; and a in Rotterdam, the Netherlands. She was on challenging to identify the first infected per-
veterinarian who found evidence linking the team that found, in 2013, that dromedary son, says Leendertz.
the 2014 West Africa Ebola outbreak to bats camels were an intermediate host for the virus Other team members include researchers
roosting in a hollow tree. These researchers that causes MERS, which has killed more than from Denmark, the United Kingdom, Australia,
are among the team that the World Health 850 people. She has since worked with another Russia and Japan.
Organization (WHO) has assembled to explore team member — Elmoubasher Farag, an epi- Although the team members are highly
the origins of the coronavirus pandemic. demiologist at the Ministry of Public Health qualified, eight out of ten are men and inves-
The investigation aims to find out how in Doha — to test camels for MERS antibodies. tigators from Europe dominate the group;
and when the virus SARS-CoV-2 first infected During the COVID-19 pandemic, Koopmans none is from Africa or South America, says
people. Strong evidence suggests that the has tracked the rapid spread of SARS-CoV-2 Angela Rasmussen, a virologist at Georgetown
coronavirus originated in bats, but its jour- in mink farms in Europe. Studies on the pan- University, who is based in Seattle, Washing-
ney to people remains a mystery. Scientists demic’s origin will need to explore the role of ton. “It could be more representative of the
say the team is highly qualified, but its task will animals kept for fur and food, she says. larger global scientific community,” she says.
be challenging. Koopmans says that the group is keeping an She also says that Daszak’s ties to the WIV
“This is an excellent team with a lot of expe- open mind about how the pandemic started could raise a conflict of interest, given the
rience,” says Martin Beer, a virologist at the and will not exclude any scenarios, including unsubstantiated claims that the virus acci-
Federal Research Institute for Animal Health the unlikely one that SARS-CoV-2 accidentally dentally leaked from the lab.
in Greifswald, Germany. escaped from a laboratory. Scientists have pre- Daszak says that he has been transparent
The group will be working with research- viously told Nature that the virus is likely to about his work in China. The trust he has built
ers in China and professionals from several have passed from bats to humans, probably with researchers there will help the team to
other international agencies, and will start through an intermediate animal — but ruling gain a deeper understanding of the pandemic’s
the search in Wuhan — the Chinese city where out the lab scenario will be difficult. “Anything early days, he says.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
REVERSAL OF BIOLOGICAL Q&A
CLOCK RESTORES
VISION IN OLD MICE How sex and
gender analysis
‘Reprogramming’ approach seems
to make old cells young again. improves science
The European Commission has said
By Heidi Ledford that it aims to make sex and gender
R
previous work had shown that if the genes are
present in extra copies or expressed for too analysis mandatory in the research it
esearchers have restored vision in old long, some mice will develop tumours. funds through its €85-billion (US$100-
mice and in mice with damaged retinal In Sinclair’s lab, geneticist Yuancheng Lu billion) Horizon Europe programme.
nerves by resetting some of the thou- looked for a safer approach. He dropped one of The strengthened policy is a result of
sands of chemical marks that accu- the four genes used by Belmonte’s team — one recommendations made in a report (see
mulate on DNA as cells age. The work, that is linked to cancer — and put the remaining go.nature.com/3mryv1a), produced last
published on 2 December in Nature, suggests a three into a virus that could shuttle them into month by an expert group chaired by
new approach to reversing age-related decline, cells. He included a switch that would allow Londa Schiebinger, who studies gender
by reprogramming some cells to a ‘younger’ him to turn the genes on by giving mice water and science at Stanford University in
state in which they are better able to repair or spiked with a drug. Withholding the drug California. Nature spoke to Schiebinger
replace damaged tissue. would switch the genes off again. about the group’s work.
“It is a major landmark,” says Juan Carlos Because mammals lose the ability to
Izpisua Belmonte, a developmental biologist regenerate components of the central nerv- How do you convince people of the need
at the Salk Institute for Biological Studies in La ous system early in development, Lu and his for sex and gender analysis in research?
Jolla, California, who was not involved in the colleagues tested their approach there — in Our iconic example of failure when you
study. “These results clearly show that tissue the eye’s retinal nerves. They first injected the don’t do this analysis is that between 1997
regeneration in mammals can be enhanced.” virus into the eye to see whether expression of and 2001, ten prescription drugs were
the three genes would allow mice to regenerate withdrawn from the US market, eight of
Visionary approach injured nerves — something that no treatment which were more dangerous for women
Ageing affects the body in myriad ways — had yet been shown to do. than for men. When drugs fail, you’re losing
among them, adding, removing or altering Lu remembers the first time that he saw a money and people are suffering and dying.
chemical groups such as methyls on DNA. nerve regenerating from injured eye cells. “It From preclinical studies to human clinical
These ‘epigenetic’ changes accumulate as a was breathtaking,” he says. trials, you have to collect data on males
person ages, and some researchers have pro- and females and analyse them separately.
posed tracking the changes as a way of calibrat- “If epigenetic changes are
ing a molecular clock to measure biological What mistakes do researchers make in
age, an assessment that takes into account
a driver of ageing, can you these analyses?
biological wear-and-tear and can differ from reset the epigenome? Can The biggest mistake is simply ignoring sex,
chronological age. you reverse the clock?” gender and intersectionality. Another is to
“We set out with a question: if epigenetic not distinguish between biological sex and
changes are a driver of ageing, can you reset sociocultural gender. Gender is specific
the epigenome?” says David Sinclair, a genet- The team went on to show that its system to ethnicity, age and culture. Researchers
icist at Harvard Medical School in Boston, improved visual acuity in mice with age-related need to get the right variables, collect their
Massachusetts, and a co-author of the Nature vision loss, or with increased pressure inside data correctly and do the analysis well.
study (Y. Lu et al. Nature 588, 124–129; 2020). the eye — a hallmark of the disease glaucoma.
“Can you reverse the clock?” The approach also reset epigenetic patterns Are there research areas where people
There were suggestions that the approach to a more youthful state in mice and in human might be surprised that sex and gender
could work: in 2016, Belmonte and his col- cells grown in the laboratory. It is still unclear analysis is essential?
leagues reported the effects of expressing how cells preserve a memory of a more youth- For some marine organisms, sex is
four genes in mice genetically engineered ful epigenetic state, says Sinclair, but he and determined by temperature. Our report
to age more rapidly than normal (A. Ocampo his colleagues are trying to find out. includes a fascinating study from Australia,
et al. Cell 167, 1719–1733; 2016). It was already In the meantime, Harvard has licensed where they found that the turtles in the
known that triggering these genes could cause the technology to Boston company Life north of the Great Barrier Reef were 99%
cells to lose their developmental identity — Biosciences, which, Sinclair says, is carrying female, whereas in the cooler south, it was
the features that make, for example, a skin cell out preclinical safety assessments with a view about 67% female. It’s important that we
look and behave like a skin cell. But rather than to developing it for use in people. It would be understand how global warming is skewing
turn the genes on and leave them that way, an innovative approach to treating vision loss, these ratios, so that we can efficiently
Belmonte’s team turned them on for only a few says Botond Roska, director of the Institute manage ecosystems.
days, then switched them off again. The result of Molecular and Clinical Ophthalmology in
was mice that aged more slowly, and had a pat- Basel, Switzerland, but will probably need con- Interview by Elizabeth Gibney.
tern of epigenetic marks indicative of younger siderable refinement before it can be deployed Edited for length and clarity.
animals. But the technique had disadvantages: safely in humans.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Feature
PLANET OBSERVER/UNIVERSAL IMAGES GROUP/GETTY

Life might have begun in bodies of water on land, perhaps in craters similar to Canada’s Lake Manicouagan, formed by an ancient impact.
THE WATER PARADOX

AND THE ORIGINS OF LIFE
Water is essential for life, but it breaks down DNA and other key molecules. So how did the first
cells deal with such a necessary and dangerous substance? By Michael Marshall
O
n 18 February next year, a NASA on Perseverance, John Sutherland will be pay- in the ocean, recent research suggests that the
spacecraft will plummet through ing particularly close attention. Sutherland, a key molecules of life, and its core processes,
the Martian atmosphere, fire its biochemist at the MRC Laboratory of Molecu- can form only in places such as Jezero — a rel-
retro-rockets to break its fall and lar Biology in Cambridge, UK, was one of the atively shallow body of water fed by streams.
then lower a six-wheeled rover scientists who lobbied NASA to visit Jezero That’s because several studies suggest
named Perseverance to the sur- Crater, because it fits his ideas about where life that the basic chemicals of life require ultra-
face. If all goes according to plan, might have originated — on Mars and on Earth. violet radiation from sunlight to form, and
the mission will land in Jezero Cra- The choice of landing site reflects a shift in that the watery environment had to become
ter, a 45-kilometre-wide gash near the planet’s thinking about the chemical steps that trans- highly concentrated or even dry out com-
equator that might once have held a lake of formed a few molecules into the first biologi- pletely at times. In laboratory experiments,
liquid water. cal cells. Although many scientists have long Sutherland and other scientists have produced
Among the throngs of earthlings cheering speculated that those pioneering cells arose DNA, proteins and other core components of

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
cells by gently heating simple carbon-based an essential component of proteins. This has produced the building blocks of DNA
chemicals, subjecting them to UV radiation suggested to many researchers that life arose — something previously thought implausi-
and intermittently drying them out. Chem- near the surface of the ocean. ble — using energy from sunlight and some of
ists have not yet been able to synthesize such But many scientists today say there’s a fun- the same chemicals at high concentrations5.
a wide range of biological molecules in condi- damental problem with that idea: life’s corner- This approach has been extended by
tions that mimic seawater. stone molecules break down in water. This is biochemist Moran Frenkel-Pinter at the
The emerging evidence has caused many because proteins, and nucleic acids such as NSF–NASA Center for Chemical Evolution in
researchers to abandon the idea that life DNA and RNA, are vulnerable at their joints. Atlanta, Georgia, and her colleagues. Last year,
emerged in the oceans and instead focus on Proteins are made of chains of amino acids, they showed that amino acids spontaneously
land environments, in places that were alter- and nucleic acids are chains of nucleotides. linked up to form protein-like chains if they
nately wet and dry. The shift is hardly unani- If the chains are placed in water, it attacks the were dried out6. And those kinds of reaction
mous, but scientists who support the idea of links and eventually breaks them. In carbon were more likely to occur with the 20 amino
a terrestrial beginning say it offers a solution chemistry, “water is an enemy to be excluded acids found in proteins today, compared with
to a long-recognized paradox: that although as rigorously as possible”, wrote the late bio- other amino acids. That means intermittent
water is essential for life, it is also destructive chemist Robert Shapiro in his totemic 1986 drying could help to explain why life uses only
to life’s core components. book Origins, which critiqued the primordial those amino acids, out of hundreds of possi-
Surface lakes and puddles are highly prom- ocean hypothesis2. bilities. “We saw selection for today’s amino
ising, says David Catling, a planetary scientist This is the water paradox. Today, cells solve acids,” says Frenkel-Pinter.
at the University of Washington in Seattle. it by limiting the free movement of water in
“There’s a lot of work that’s been done in the their interiors, says synthetic biologist Kate Wet and dry
last 15 years which would support that direc- Adamala at the University of Minnesota in Intermittent drying out can also help to drive
tion.” Minneapolis. For this reason, popular images these molecular building blocks to assemble
of the cytoplasm — the substance inside the into more-complex, life-like structures.
Primordial soup cell — are often wrong. “We are taught that A classic experiment along these lines
Although there is no standardized definition cytoplasm is just a bag that holds everything, was published in 1982 by researchers David
of life, most researchers agree that it needs and everything is swimming around,” she Deamer and Gail Barchfeld, then at the Univer-
several components. One is information-car- adds. “That’s not true, everything is incredi- sity of California, Davis7. Their aim was to study
rying molecules — DNA, RNA or something bly scaffolded in cells, and it’s scaffolded in how lipids, another class of long-chain mol-
else. There must have been a way to copy ecule, self-organize to form the membranes
these molecular instructions, although the “Wet–dry cycles are that surround cells. They first made vesicles:
process would have been imperfect to allow spherical blobs with a watery core surrounded
for mistakes, the seeds of evolutionary change.
everywhere. It’s as simple as by two lipid layers. Then the researchers dried
Furthermore, the first organisms must have rainwater evaporating on the vesicles, and the lipids reorganized into
had a way to feed and maintain themselves, wet rocks.” a multi-layered structure like a stack of pan-
perhaps using protein-based enzymes. cakes. Strands of DNA, previously floating in
Finally, something held these disparate parts the water, became trapped between the layers.
together, keeping them separate from their a gel, not a water bag.” When the researchers added water again, the
environment. If living things keep water controlled, then vesicles reformed — with DNA inside them.
When laboratory research into life’s origins the implication, say many researchers, is obvi- This was a step towards a simple cell.
started in earnest in the 1950s, many research- ous. Life probably formed on land, where water “These wet–dry cycles are everywhere,” says
ers assumed that life began in the sea, with a was only intermittently present. Deamer, who is now at the University of Cali-
rich mix of carbon-based chemicals dubbed fornia, Santa Cruz. “It’s as simple as rainwater
the primordial soup. Land start evaporating on wet rocks.” But when they are
This idea was independently proposed in Some of the key evidence in favour of this idea applied to biological chemicals such as lipids,
the 1920s by biochemist Alexander Oparin, in emerged in 2009, when Sutherland announced he says, remarkable things happen.
what was then the Soviet Union, and geneticist that he and his team had successfully made two In a 2008 study, Deamer and his team mixed
J. B. S. Haldane in the United Kingdom. Each of the four nucleotides that comprise RNA3. nucleotides and lipids with water, then put
imagined the young Earth as a huge chemi- They started with phosphate and four simple them through wet–dry cycles. When the lipids
cal factory, with multitudes of carbon-based carbon-based chemicals, including a cyanide formed layers, the nucleotides linked up into
chemicals dissolved in the waters of the early salt called cyanamide. The chemicals were RNA-like chains — a reaction that would not
oceans. Oparin reasoned that increasingly dissolved in water throughout, but they were happen in water unaided8.
complicated particles were formed, culmi- highly concentrated, and crucial steps required Other studies are pointing to a different
nating in carbohydrates and proteins: what UV radiation. Such reactions could not take factor that seems to be a key part of life’s
he called “the foundation of life”. place deep in an ocean — only in a small pool or origins: light. That’s one of the conclusions
In 1953, a young researcher named Stanley stream exposed to sunlight, where chemicals coming from the team of synthetic biologist
Miller at the University of Chicago in Illinois could be concentrated, he says. Jack Szostak at Massachusetts General Hos-
described a now-famous experiment that was Sutherland’s team has since shown that the pital in Boston, which works with ‘protocells’
seen as confirming these ideas1. He used a glass same starter chemicals, if they are treated sub- — simple versions of cells that contain a hand-
flask holding water to mimic the ocean, and tly differently, can also produce precursors to ful of chemicals, but can grow, compete and
another flask containing methane, ammonia proteins and lipids4. The researchers suggest replicate themselves. The protocells display
and hydrogen to simulate the early atmos- that these reactions might have taken place if more-lifelike behaviours if they are exposed to
phere. Tubes connected the flasks, and an water containing cyanide salts was dried out by conditions similar to those on land. One study,
electrode simulated lightning. A few days of the Sun, leaving a layer of dry, cyanide-related on which Adamala was a co-author, found that
heating and electric shocks were enough to chemicals that was then heated by, say, geo- the protocells could use energy from light to
make glycine, the simplest amino acid and thermal activity. In the past year, his team divide, in a simple form of reproduction9.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Feature
Similarly, Claudia Bonfio, now also at the run faster or secrete toxins to survive preda- molecules are destroyed too quickly, but not
MRC Laboratory of Molecular Biology, and her tors, the first biological molecules might have so little that nothing changes.
colleagues showed in 2017 that UV radiation evolved to cope with water’s chemical attacks
drives the synthesis of iron-sulfur clusters10, — and even to harness its reactivity for good. Warm little ponds
which are crucial to many proteins. These This year, Frenkel-Pinter’s team followed Where might all this have happened? On this
include those in the electron transport chain, up on its previous study6 showing that drying point, there is a generational divide in the
which helps to power all living cells by driving caused amino acids to link up spontaneously. field. Many senior researchers are committed
the synthesis of the energy-storage molecule The team found that their proto-proteins to one scenario or another, whereas younger
ATP. The iron–sulfur clusters would break could interact with RNA, and that both became researchers often argue that the question is
apart if they were exposed to water, but Bon- more stable in water as a result11. In effect, wide open.
fio’s team found they were more stable if the water acted as a selection pressure: only those The open ocean is unviable, says
clusters were surrounded by simple peptides combinations of molecules that could survive Frenkel-Pinter, because there is no way for
3–12 amino acids long. in water would continue, because the others chemicals to become concentrated. “That’s
would be destroyed. really a problem,” agrees Bonfio.
Water, but not too much The idea is that, with each cycle of wetting, An alternative marine idea has been cham-
Such studies have given momentum to the the weaker molecules, or those that could pioned since the 1980s by geologist Michael
idea that life began on a well-lit surface with a not protect themselves by binding to others, Russell, an independent researcher formerly at
limited amount of water. However, there is still were destroyed. Bonfio and her team demon- the Jet Propulsion Laboratory in Pasadena, Cal-
debate over how much water was involved, and strated this in a study this year12, in which they ifornia. Russell argues that life began in vents
what part it played in starting life. attempted to convert simple fatty acids into on the seabed, where warm alkaline water
Like Deamer, Frenkel-Pinter argues that more-complex lipids resembling those found seeps up from geological formations below.
wet–dry cycles were crucial. Dry conditions, in modern cell membranes. The researchers Interactions between warm water and rocks
she says, provided an opportunity for chain created mixtures of lipids, and found that the would provide chemical energy that would
molecules such as proteins and RNA to form. simple ones were destroyed by water, while the first drive simple metabolic cycles, which
But simply making RNA and other mole- larger, more complex ones accumulated. “At would later start making and using chemicals
cules is not life. A self-sustaining, dynamic some point, you would have enough of these such as RNA.
system has to form. Frenkel-Pinter suggests lipids for them to form membranes,” she says. Russell is critical of Sutherland’s approach.
that water’s destructiveness could have helped In other words, there might be a Goldilocks “He’s doing all these fantastic bits of chem-
to drive that. Just as prey animals evolved to amount of water: not so much that biological istry,” he says, but for Russell, none of it is
relevant. That’s because modern organisms
use completely different chemical processes
to make substances such as RNA. He argues
that these processes must have arisen first, not
the substances themselves. “Life, it picks very
particular molecules. But you can’t pick them
from the bench. You’ve got to make them from
scratch and that’s what life does.”
Sutherland counters that once RNA, pro-
teins and so forth had formed, evolution would
have taken over and enabled proto-organisms
to find new ways to make these molecules and
thus sustain themselves.
Meanwhile, many researchers have
expressed scepticism about Russell’s alka-
line-vent hypothesis, arguing that it lacks
experimental support.
By contrast, chemical experiments that
simulate surface conditions have made the
building blocks of nucleic acids, proteins and
lipids. “None of that synthesis exists in that
deep-sea hydrothermal vent hypothesis. It just
simply hasn’t been done, and possibly because
it can’t be done,” says Catling.
Frenkel-Pinter is also critical of the vent
idea, because the molecules she works with
wouldn’t survive long in those conditions.
“The formation of these protopeptides is not
very compatible with hydrothermal vents,”
says Frenkel-Pinter.
BETTMANN/GETTY
A possible solution was proposed in May

by geochemist Martina Preiner, a postdoc
at the University of Düsseldorf in Germany,
and her colleagues. She argues that in the
In experiments in the 1950s, Stanley Miller created amino acids from simple building blocks. rocks beneath hydrothermal vents, heat and
chemical reactions bind up water molecules

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
or break them apart — creating dry spaces13.
“There are rock–water interactions getting
rid of the water to a certain extent,” she says.
Intermittently, more seawater would trickle in,
giving “something like a wet–dry cycling”. This
ought to make the deep-sea rocks much more
suitable for the formation of key molecules,
argues Preiner, although she acknowledges
this is still a hypothesis. “Of course, you still
have to do the according experiments to prove
that this could do certain reactions.”
At present, however, that evidence doesn’t
exist. Meanwhile, experimental support is
growing for the idea that life started in small
bodies of water on land.
Sutherland favours a meteorite impact
crater, heated by the Sun and by the residual
ESA/FU-BERLIN
energy of the impact, with multiple streams

of water running down the sloping sides,
and finally meeting in a pool at the bottom.
This would have been a complex, 3D environ- NASA’s Perseverance rover will search for signs of life in Jezero Crater on Mars.
ment with mineral surfaces to act as catalysts,
where carbon-based chemicals could have if multiple reactions were blocked. For this The best case, says Catling, is that
been alternately dissolved in water and dried reason, Szymkuć argues that it is too early Perseverance finds complicated carbon-based
out in the Sun. “You can say with some degree to rule out any of the scenarios for where life molecules in the layers of Martian sediment,
of confidence we need to be on the surface, originated. That will require systematically such as lipids or proteins, or their degraded
we can’t be deep in the ocean or 10 kilometres testing a range of different environments, to remains. He also hopes for evidence of
down in the crust,” says Sutherland. “Then we see which reactions occur where. wet–dry cycles. This might come in the form
need phosphate, we need iron. A lot of those of carbonate layers that formed when a lake
things are very easily delivered by iron–nickel Beyond Earth dried and refilled many times. He suspects
meteorites.” The impact scenario has a fur- If experiments such as Sutherland’s do point that “life didn’t get particularly far on Mars”,
ther advantage: meteorite impacts shock the way to how life began on Earth, they can because we haven’t seen any obvious signs of
the atmosphere, producing cyanide, says also help to explore where life might have it, such as clear fossils or carbon-rich black
Sutherland. started elsewhere in the cosmos. shales. “What we’re looking for is pretty sim-
Deamer has long championed a different Mars has attracted the most attention, ple, maybe even to the point of being prebiotic
suggestion: volcanic hot springs. In a study this because there is clear evidence it once had rather than the actual cells themselves.”
year, he and his colleague Bruce Damer argued liquid water on its surface. The landing site for It could be that Mars took only the first few
that lipids would have formed protocells in the NASA’s Perseverance rover, the Jezero Crater, chemical steps towards life, and did not go all
hot waters14, as his earlier experiments indi- was chosen in part because it seems to have the way. In that case, we might find fossils — not
cated. The wet–dry cycles on the edges of the once been a lake — and could have hosted the of life, but of pre-life.
pools would have driven the formation and
copying of nucleic acids such as RNA. “You can say with some Michael Marshall is a science writer based
Deamer has conducted several experiments in Devon, UK, and the author of The Genesis
in modern volcanic hot springs to test his ideas.
degree of confidence we Quest.
In 2018, his team showed that vesicles could need to be on the surface, we
1. Miller, S. L. Science 117, 528–529 (1953).
form in hot spring water15, and even enclose can’t be deep in the ocean.” 2. Shapiro, R. Origins: A Skeptic’s Guide to the Creation of
nucleic acids — but they would not form in sea- Life on Earth (Summit, 1986).
water. A follow-up study last year found that 3. Powner, M. W., Gerland, B. & Sutherland, J. D. Nature 459,
239–242 (2009).
when the resulting vesicles were dried, nucle- chemistry Sutherland has studied. He helped 4. Patel, B. H., Percivalle, C., Ritson, D. J., Duffy, C. D. &
otides linked up to form RNA-like strands16. to write a 2018 presentation to NASA led by Sutherland, J. D. Nature Chem. 7, 301–307 (2015).
Narrowing down the location where life Catling, which summarized the prebiotic 5. Xu, J. et al. Nature 582, 60–66 (2020).
6. Frenkel-Pinter, M. et al. Proc. Natl Acad. Sci. USA 116,
started will require understanding of the chemistry findings and advised on where 16338–16346 (2019).
broader picture of prebiotic chemistry: how Perseverance should look. “We presented this 7. Deamer, D. W. & Barchfeld, G. L. J. Mol. Evol. 18, 203–206
the many reactions fit together, and the ranges chemistry and said this Jezero Crater, which is (1982).
8. Rajamani, S. et al. Orig. Life Evol. Biosph. 38, 57–74 (2008).
of conditions under which they occur. That the one they eventually chose, is the one where 9. Zhu, T. F., Adamala, K., Zhang, N. & Szostak, J. W. Proc.
mammoth task has been attempted by a group there was the highest likelihood of this chem- Natl Acad. Sci. USA 109, 9828–9832 (2012).
led by chemist Sara Szymkuć, president of the istry playing out,” says Sutherland. 10. Bonfio, C. et al. Nature Chem. 9, 1229–1234 (2017).
11. Frenkel-Pinter, M. et al. Nature Commun. 11, 3137 (2020).
start-up firm Allchemy in Highland, Indiana. It will be two months before Perseverance 12. Bonfio, C., Russell, D. A., Green, N. J., Mariani, A. &
The team published a comprehensive study in reaches Mars — and years before the samples it Sutherland, J. D. Chem. Sci. 11, 10688–10697 (2020).
September that used a computer algorithm to collects are returned to Earth by an as-yet-un- 13. do Nascimento Vieira, A., Kleinermanns, K., Martin, W. F. &
Preiner, M. FEBS Lett. 594, 2717–2733 (2020).
explore how a vast network of known prebiotic named future mission. So, there is still a long 14. Damer, B. & Deamer, D. Astrobiology 20, 429–452 (2020).
reactions could have produced many of the wait before we find out whether Mars harbours 15. Milshteyn, D., Damer, B. Havig, J. & Deamer, D. Life 8, 11
biological molecules used in life today17. life, or if it did so billions of years ago. But even (2018).
16. Deamer, D., Damer, B. & Kompanichenko, V. Astrobiology
The network was highly redundant, so key if it did not, it might reveal traces of prebiotic 19, 1523–1537 (2019).
biological compounds could still form even chemistry. 17. Wołos, A. et al. Science 369, eaaw1955 (2020).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Science in culture
Books & arts

In Silico
Director: Noah Hutton
Sandbox Films (2020)
Kasparov in 1997 — simulate an entire rodent

brain within a decade. He planned to build it
from information about the brain’s tens of
millions of individual neurons.
Entranced, Hutton sought permission
to film with the project annually over those
ten years. He had no idea that he would end
up tracking one of the twenty-first century’s
most explosive scientific ventures. Nor that
the ten-year horizon would never get closer.
Rise and fall

In 2010, the first year of filming, Hutton cap-
tures Markram’s boastful mood: “I believe we
will understand the brain before we even finish
building it.” In 2011, Blue Brain ran a simulation
that for the first time generated something
the team hadn’t programmed — a wave that
seemed to mimic the spontaneous, synchro-
nized electrical activity in real brains. “This is
it,” gasps Markram.
SANDBOX FILMS
But that year, Hutton also started to encoun-

ter critics in the neuroscience community. They
claimed that the simulation project was pre-
In Silico focuses on Henry Markram’s attempts to model rodent and human brains. mature because too little was known about the
different types of neuron in the brain and how
Implosion of a billion-euro
they were wired. Anyone can repair a broken
watch by putting its known components in the
right places, neuroscientist Zachary Mainen at
brain model: the movie

the Champalimaud Centre for the Unknown
in Lisbon, Portugal, tells the camera. Try this
with the incompletely understood components
of the brain, he says, “and you’ll end up with a
bunch of parts that doesn’t tell the time”.
Annual footage offers tantalizing glimpses inside a As for that real-looking wave? Sebastian
Seung, now at Princeton University in New Jer-
troubled European flagship project. By Alison Abbott sey, says: “How would you know if that activity
I
pattern was right or wrong?”
When Hutton visited the next year, Markram
n October 2013, I attended the launch of the across Europe. Yet aspects of what went so ticked him off for contacting critics without
Human Brain Project in Lausanne, Switzer- expensively wrong still remain elusive. informing him. The commission was decid-
land, as correspondent for Nature. I hoped In Silico is more about the back story of ing which two projects would become its bil-
to leave with a better understanding of the the Human Brain Project (HBP). Hutton was lion-euro Future and Emerging Technologies
exact mission of the baffling billion-euro 22 years old when he watched a 2009 talk Flagships and Markram didn’t want any con-
enterprise, but I was frustrated. Things became by Henry Markram, the controversial figure troversy to upset his chances.
clear the following year, when the project fell who later became the first director of the HBP. The film suggests (as other commentators
spectacularly, and very publicly, apart. Markham was speaking about the Blue Brain have) that Markram saw the flagship pro-
Noah Hutton’s documentary In Silico cap- Project, a major initiative he had launched a gramme as a means to expand Blue Brain. But
tures a sense of what it was like behind the few years before at one of Europe’s top univer- to win the money, it had to be more than that.
scenes of the project, which was supported sities, the Swiss Federal Institute of Technology He had to team up with top scientists in other
with great fanfare by the European Commis- in Lausanne, with generous funding from the European Union countries to present an inter-
sion. It had been hyped as a quantum leap in Swiss government. He claimed that he would disciplinary collaboration. He persuaded some
understanding how the human brain works. — with the help of a supercomputer related to initially sceptical cognitive neuroscientists
Instead, it left a trail of angry neuroscientists the one that beat world chess champion Garry to join. Their job, it was understood, would

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Books & arts
be to ensure that brain simulations would be
linked to behavioural outcomes, so they would
always know whether any simulated activity
Books in brief was ‘right or wrong’. The film only scratches
the surface of this thorny issue, although it is
Wall Disease central to the scientific controversy.
Jessica Wapner The Experiment (2020) What comes across more strongly is how
Since the Berlin Wall fell in 1989, border walls have multiplied, notes Markram’s frequent overblown claims for the
science journalist Jessica Wapner in her compelling, dispiriting, global simulation projects — that they would obviate
survey. In the decade after the September 2001 terrorist attacks, the need for animal experiments, for example
47 appeared worldwide; Wapner investigates their geography and — irritated many in the community. “Henry
psychological effects. “Wall disease” — a translation of Mauerkrankheit, has two personalities,” says Christof Koch,
coined in 1973 by a former Berlin psychiatrist who had abandoned East
Germany for the West — consists of fear, isolation, a sense of immobility, “A fascinating window
financial insecurity and suspicion of “the other” on the far side.
into the trouble grandiose
projects and grandiose
The Brutish Museums personalities can generate.”
Dan Hicks Pluto (2020)
This timely book echoes the British Museum’s decision this year to
redisplay a bust of its founder with labels about his links to the slave president of the Allen Institute for Brain Science
trade. Dan Hicks is a curator at the Pitt Rivers Museum in Oxford, in Seattle, Washington. “One is a fantastic, sober
UK, which, like the British Museum, holds many prized objects scientist … the other is a PR-minded messiah.”
murderously looted by colonial forces in 1897 from Benin, in what is Markram’s answers to these charges on
now Nigeria. Rejecting the view of Oxford colleague John Boardman camera are often evasive; his critics, he says,
that “the rape proved to be a rescue”, Hicks vehemently advocates simply don’t accept an unconventional way
that “brutish” museums urgently begin restitution of stolen objects. of doing science.
Internal tensions
What Is a Complex System? Early optimism is quickly strained, as project
James Ladyman & Karoline Wiesner Yale Univ. Press (2020) members are sidelined. Hutton returns to find
The Santa Fe Institute in New Mexico inaugurated the study of complex that just nine months after the launch, Mainen
systems, but its founding workshops in 1984 did not define the topic. and some colleagues had written a public let-
Even today there is no agreement on a definition, nor whether one ter calling on the commission to rethink the
is possible, remark philosopher of science James Ladyman and project, claiming that autocratic management
mathematician Karoline Wiesner. After a clear analysis of systems was distorting its mission. The letter attracted
ranging from radiation to human brains, they conclude: there is no around 800 signatories from neuroscientists
“single natural phenomenon of complexity”, but ‘complexity science’ globally. (Two years later, they set out an alter-
does exist, rather than being “merely branches of different sciences”. native approach in this journal: Z. F. Mainen
et al. Nature 539, 159–161; 2016).
By 2016, Markram had been removed from
A Manual of the Mammalia the leadership (see Nature https://doi.org/
Douglas A. Kelt & James L. Patton Univ. Chicago Press (2020) fkgx; 2015). The final two years of filming fol-
The subtitle of this comprehensive, lavishly illustrated reference book low him back on Blue Brain. The simulation
terms it “an homage” to Timothy Lawlor’s acclaimed Handbook to the progresses, the 3D visualizations get more
Orders and Families of Living Mammals, which was published in 1979, impressive, research papers emerge — but the
revised, but out of date following Lawlor’s death in 2011. As wildlife project’s pep seems to drain away. Markram’s
ecologist Douglas Kelt and mammal curator James Patton note, insistence that a complete brain simulation is
Lawlor’s final edition featured about 4,170 species of mammal; today’s still just ten years away sounds hollow. Mean-
figure is 6,495. “Do not be overwhelmed”, they advise students, while, the HBP continues with a more distrib-
“simply revel in the diversity that is the Mammalia.” Andrew Robinson uted, democratic structure.
In Silico is a fascinating window into the
trouble grandiose research projects and
Yellowstone Wolves grandiose personalities can generate, even if
Eds Douglas W. Smith et al. Univ. Chicago Press (2020) it fails to get to the heart of what specifically
Twenty-five years ago, the authors reintroduced wolves to Yellowstone went wrong with the HBP. Hutton hints that the
National Park in Wyoming — the first deliberate return of an apex disputes were driven by money. I disagree; my
carnivore to a big ecosystem. Here, they relate what they’ve learnt of sense is that it came down to leadership style
the animals’ predation, mating, play, genetics, disease and more, and and irresolvable differences in scientific opin-
their impact on other species and the landscape. Also detailed are the ion. There is a bolder, even more interesting,
fraught history, politics and implications of rewilding. Glorious pictures story waiting to be told.
bear witness to fragile gains. US President Donald Trump’s silver-
anniversary gift? Rolling back protections on the wolves. Sara Abdulla Alison Abbott writes from Munich, Germany.
e-mail: alison.abbott.consultant@
springernature.com

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Setting the agenda in research
Comment
BUDA MENDES/GETTY
Smoke rises from a fire in Brazil’s Pantanal, the world’s largest tropical wetland, in September.
Rescue Brazil’s burning Pantanal

wetlands
Renata Libonati, Carlos C. DaCamara, Leonardo F. Peres, Lino A. Sander de Carvalho & Letícia C. Garcia
B
Climate extremes, poor razil has changed. As well as the much private land. Conservation areas such
COVID-19 pandemic killing more than as Encontro das Águas State Park have been
management and lax laws 170,000 of its citizens so far, 2020 has devastated — it contained one of the largest
are making this World seen almost one-third of the Pantanal, populations of jaguars in the world.
the largest tropical wetland in the Fires’ impacts have been felt nationwide.
Heritage Site prone to world, on fire. Four million hectares of for- Smoke has spread thousands of kilometres,
fierce fires. Researchers and est, savannah and shrub-land (an area bigger reducing air quality in São Paulo, Rio de
governments must develop than the US state of Maryland) have gone up Janeiro and Curitiba. Southern states have
in flames since January (see go.nature.com/ experienced showers of black rain. The fires
a plan to manage these risks 2jtw6va). Almost all the Indigenous territories are decimating Brazil’s economy, curbing
together. and conservation facilities were burnt, as was inward investment as well as sectors such as

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Comment
air travel and tourism that are already hit hard to push back exceptionally fierce flames (see cattle fences, use of fire to ward off bee attacks
by the pandemic. ‘Pantanal fire crisis’). when collecting honey, and even car crashes
The public is worried. The fires have The total loss will take months to calculate. and damaged agricultural machinery. Cattle-
made the headlines for months. Thousands But the impacts are long lasting. Charcoal and men burn the landscape to remove shrubs and
of Brazilians have volunteered to fight the ash contaminate rivers and promote harm- stimulate the growth of native grasses, which
flames, rescue wildlife or donate money. Yet, ful bacteria that poison drinking-water sup- are adapted to fire and sprout after pruning
Brazil’s government is doing little. It is ignor- plies and kill fish. Eroded soils are flushed or scorching. Such fires regularly get out of
ing the causes of the fires: a combination downstream. Fire-sensitive plants struggle control, especially in areas where there’s no
of inadequate fire management, climate to produce seeds. Vast tracts of land will need system for managing them3.
extremes, human behaviour and weak envi- to be assessed to understand whether they The frequency and severity of fire out-
ronmental regulations. Worse, it has slashed can be restored. Communities will have to be breaks are worsening, as the climate warms
funding for fire prevention and has been rebuilt. and human impacts increase. Since 1980, aver-
slow to contract firefighters. It has even age temperatures there have risen by 2 °C and
cast doubts on the reliability of satellite fire Growing risk humidity has fallen by 25%, according to the
detections. What lies behind these fires? The Pantanal European Centre for Medium-range Weather
On the scientific front, fire risks and impacts is no stranger to burning, even though it is a Forecasts (ECMWF). This year saw the worst
in the region are under-studied. Deeper wetland2. For half the year it is dry and prone drought recorded in the Pantanal in 60 years
research is needed on the weather condi- to catching alight, especially during drought. (see go.nature.com/2jpdubc), induced by unu-
tions that fan fires, as well as the influences Sometimes lightning causes the spark. More sually warm waters in the North Atlantic4. The
of ecology and management. Scientists need often it is human-related — flashes from elec- wet season saw 57% less rain than normal. By
to know how the many factors behind large trical cables, burning garbage and wood from June, the Paraguay River was at half its usual
fires interact, including vegetation stress,
extreme weather and human activities. And
Encontro das
more studies are needed to inform fire man- Águas State Park
agement strategies in the region. contains the
This year’s fire season in the Pantanal is most jaguars
BRAZIL
in the Pantanal.
exceptional. But the conditions that led to
these blazes are becoming increasingly com-
mon as the area warms. In response, political,
BOLIVIA
socio-economic and scientific approaches
need to change. Researchers and governments
need to come together to develop a compre-
hensive strategy for preventing and managing
fires. Otherwise this great tropical wilderness PANTANAL
will not bounce back.
FIRE CRISIS
Almost one-third of this
Devastating impacts wetland World Heritage Site Pantanal
Amolar
With more than 84% of its territory conserved, burnt down in 2020. Drought, region
climate change, inadequate
the Pantanal is the largest remaining wetland government response and the
area of natural vegetation in the world. It’s a pandemic all played a part. 100% of
Matogrossense
UNESCO World Heritage Site. Indigenous, riv- National Park was
erine and quilombo communities live there. burnt. It’s one of
Traditional farmers practice unique forms Paraguay South America’s
River premier wetlands
of sustainable agriculture, including grazing Brazil for waterbirds.
SOURCE: LABORATORY FOR ENVIRONMENTAL SATELLITE APPLICATIONS, FED. UNIV. RIO DE JANEIRO
cattle on native pastures and moving animals
to higher land when lowlands flood. Tourists
flock to the region for its spectacular scenery,
safaris and sport-fishing.
Each rainy season, from October to April, PARAGUAY
pulses of floods swell the Paraguay River to sup-
port ecosystems found nowhere else on Earth. Fire danger ratings are rising
Endangered jaguar, giant otter, marsh deer as the region warms. 2020
saw the worst conditions in
and hyacinth macaws roam wild. Thousands three decades.
of birds pass through on their migrations1. Indigenous Kadiwéu
Difficulty of 9.9 people are trained
It’s a haven for caimans, capybaras, monkeys,
controlling fires to fight fires in their
deer, coatis, tapirs, snakes and the jabiru stork (DSR index*) territory.
( Jabiru mycteria) — the region’s symbol.
The fires have affected all aspects of life.
COVID-19 has made things worse. PREVFOGO, Conservation Indigenous
areas territories
the national centre for forest fire prevention
4 Fires
and fighting, has struggled to hire and train
2019 2020 Both years
firefighters. Many fires broke out in remote
regions, even underground, that were hard 1980 1990 2000 2010 2020
to reach. Local firefighters in the Kadiwéu ter- *Averaged daily severity rating (DSR) from January to
August of each year for the Pantanal biome.
ritory, for example, struggled almost alone

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
level. This combination of hot, dry conditions are a staple for fauna and used by Indigenous account, including effective fire management
pushed flammability thresholds to their people to make black ink for body painting. and environmental protection policies.
highest since 1980. Such thresholds indicate The impacts cascade quickly. Repeated Researchers need to shore up knowledge about
the difficulty of controlling fires, a scale that wildfires lower the resilience of communities the fire regime there to inform this strategy.
is quantified using the averaged daily sever- and vegetation; forests are replaced by open First, gather satellite and other data about
ity rating (DSR) index, which is derived from landscapes with fewer resources. the time, location and intensity of fires, burnt
ECMWF data. Deforestation in Amazonia has area and vegetation conditions before and
also been linked to reduced rainfall in the Economic fallout after. This information can then be used to
Pantanal, although this is debated. Brazil must act on deforestation and forest assess factors behind the onset and spreading
Environmental regulations are failing to fires to protect its economy. After earlier of fires. Scientists should model the impacts of
keep up5. In July, Brazil’s government issued fires in 2019, Norway and Germany froze current and future land use and climate change
a 120-day ban on the use of fire in the Amazon their donations to the Brazilian govern- on fire events, as well as feedbacks such as
and the Pantanal. It seems to have been ment’s Amazon Fund, after having contrib- between biomass burning and global warming.
widely ignored. The government has denied uted more than $1.2 billion and $68 million, Second, model fire management and
responsibility, blaming Indigenous peoples respectively. Around 250 investors, including response strategies, including the impacts
and traditional communities for starting fires, the California Public Employees’ Retirement on biota, pasture, communities, economies,
and criticized campaigns by the media and System (CalPERS), representing approxi- ecology, weather and fire risk. Fire managers
non-governmental organizations highlighting mately $17.7 trillion in assets, endorsed an need to decide which areas to protect and
the exceptionality of the fire season. open letter pointing out the financial impacts which activities to prohibit, taking into account
Resources for environmental protection that deforestation may have on investee com- scientific, Indigenous and local knowledge.
and climate actions have been slashed, espe- panies (see go.nature.com/36gzirt). Some areas could be kept fire free, or have
cially in the past two years. The Ministry of carefully managed blazes outside the dry sea-
the Environment’s US$630-million budget “Resources for son to protect biodiversity. Other areas might
was cut by around 20% in 2020 and looks set accommodate agriculture, cattle or tourism, as
to fall by a further 35% in 2021. Brazil is also
environmental protection long as fire-management principles, as well as
failing to meet its commitment to reduce and climate actions have state and federal legislation on environmental
greenhouse-gas emissions under the Paris been slashed.” protection are followed (such as the 2012 Bra-
climate agreement6. Licensing requirements zilian Forest Code). Near-real-time informa-
for dams, roads and mines have been weak- tion about the location, intensity and spread
ened (Nature 572, 161–162; 2019). Last year, In June, 7 European investment firms, of wildfires in the Pantanal should be dissemi-
to promote agricultural and biofuel produc- managing $2 trillion in assets ($5 billion linked nated, along with daily forecasts of fire danger.
tion, the government revoked the law that has to Brazil), announced they might divest from Funding should be directed towards fire
prohibited new sugar-cane plantations in the beef producers, grain traders and govern- management and environmental protection,
Amazon and the Pantanal that has been in ment bonds in Brazil if there was no progress as well as to law enforcement and fine
place since 2009 (ref. 7). The decree was pro- in stopping deforestation and fires. Soon collecting by environmental inspectors.
visionally suspended by the Brazilian federal after, 34 companies (including the Church Education and information programmes
court in April, and is awaiting a final decision. of England and KPL, Norway’s pension fund, in schools or by the media would make the
Researchers need to bolster evidence to managing around $4 trillion) wrote to Brazilian population more aware of the consequences
back a new approach. Until now, most studies embassies in their countries (including Norway, of irresponsible behaviour.
in the Pantanal have focused on a single dis- Sweden, France, Denmark, the Netherlands, A warming and fast-changing world demands
cipline, plant ecology for example. Research the United States and the United Kingdom) a new proactive approach to fighting wildfires.
on other topics, such as climate, isn’t granu- expressing concerns over the dismantling of
lar enough. There are few studies of human environmental policies in Brazil.
causes and responses to fires in the Pantanal, European countries (France, Austria and The authors
to inform fire-management strategies. A full the Netherlands) threaten not to ratify the
understanding of cycles of burning and long- provisional trade deal between the European Renata Libonati and Lino A. Sander
term trends is missing. Union and the Mercosur bloc (comprising de Carvalho are adjunct professors of
Fire science is multidisciplinary, spanning Brazil, Argentina, Uruguay and Paraguay), climatology and remote sensing, and
fields from climate to chemistry, ecology to unless Brazil achieves its Paris climate com- Leonardo F. Peres is an associate professor of
economics, as well as risk analysis and commitments. The EU–Mercosur agreement was atmospheric sciences and remote sensing at
putational modelling. A task force is needed to negotiated for 20 years and is considered the Federal University of Rio de Janeiro, Brazil.
bring together researchers from all these areas, the largest free-trade agreement in history. Carlos C. DaCamara is an associate professor
along with technicians working in the field. It accounts for $20 trillion of global gross of climate science at the University of Lisbon,
Neglecting the connections between domestic product (GDP), about one-quarter Portugal. Letícia C. Garcia is an adjunct
climate, land use and fire management will of the world’s economy, and the consumer professor of restoration ecology at the Federal
make it impossible to restore the Pantanal to market in the 32 countries reaches 780 million University of Mato Grosso do Sul, Brazil.
its former state, let alone protect the region in people. Currently, Brazilian companies export e-mail: renata.libonati@igeo.ufrj.br
the future. Any change to the natural pattern of almost $20 billion to the EU; the deal would
burning disrupts ecosystems and food chains, lead to an increase of $100 billion for Brazil’s 1. De Pinho, J. B., Aragona, M., Hakamada, K. Y. P. & Marini,
M. Â. Bird Conserv. Int. 27, 371–387 (2017).
sometimes completely. For instance, jaguars GDP by 2035. 2. Pivello, V. R. Fire Ecol. 7, 24–39 (2011).
will struggle to find herbivores to eat, if the 3. Pott, A. & Pott, V. J. Wetl. Ecol. Manag. 12, 547–552 (2004).
latter are killed by flames or are unable to find Steps forward 4. Thielen, D. et al. PLoS ONE 15, e0227437 (2020).
5. Abessa, D., Famá, A. & Buruaem. L. Nature Ecol. Evol. 3,
fruits and leaves in a scorched landscape. Gen- Brazil’s government must develop a long-
510–511 (2019).
erations of fire-sensitive trees could be lost, term strategy to mitigate damage from wild- 6. da Silva Junior, C. A. et al. Sci. Rep. 10, 16246 (2020).
including Genipa americana3, fruits of which fires in the Pantanal that takes all factors into 7. Ferrante, L. & Fearnside, P. M. Science 359, 1476 (2018).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Readers respond
Correspondence
Another diversity Land use predicts What counts as Combine resilience
problem — scientists’ pandemic disparities climate finance? and efficiency in
politics Define urgently post-COVID societies
COVID-19 morbidity is
According to your poll before linked to social, economic To resolve arguments over As countries prepare to remodel
the US presidential election and environmental factors, what funding actually flows themselves after the COVID-19
(see Nature 586, 654; 2020), the including residential from developed to developing pandemic, they must tackle
political leaning of scientists location, air pollution and nations, the United Nations growth and development
was 86% in favour of Democrat median household income Framework Convention on expectations by using resources
Joe Biden, now president- (H. A. Washington Nature 581, Climate Change needs to more sustainably, and by
elect, with just 8% supporting 241; 2020). These have an draw up a definition of what ensuring that their societies are
Republican Donald Trump, the overlapping determinant that constitutes climate finance. better placed to weather future
outgoing president. However, could prove to be an important At the 2009 UN climate disruptions.
this finding is glaringly out predictor of COVID‑19 summit, developed countries The COVID-19 experience
of step with the voting of the disparities: land use. pledged to mobilize US$100 indicates that society could
population from which the US The United States has a billion annually by 2020 to become more vulnerable to
scientists were drawn (about 51% strained history of land use and help developing countries systemic shocks and cascading
versus 47%, respectively). land governance, including mitigate and adapt to climate disruption if the practices on
This misalignment could ethnic constraints on land change. Has the promise which it depends excessively
be attributed to differences in ownership and unfair mortgage- been met? The answer to this prioritize system efficiency
education, understanding and lending practices. Decisions question will be available only over resilience. Efficiency
awareness of the issues at stake. on land-use classification in “the first quarter of 2022 at emphasizes performance
But such a gulf risks isolating have led to hazardous and the earliest”, according to a at maximum capacity with
science further from society at a polluting facilities being report published last month minimal use of scarce resources.
time when we should be building sited next to minority and (go.nature.com/2kdeklu) by To meet the rising demands
bridges beyond this election. other vulnerable residential the Organisation for Economic of society, efficiency-based
As academics become more communities. Despite policies Co-operation and Development approaches often rely on
aware of the importance of enacted in 1968 to protect (OECD), a club of wealthy increasingly complex and
diversity of thought, we must be against housing discrimination countries. interconnected systems. But
careful not to recreate different (go.nature.com/39v1bt3), the Letting the OECD decide what when a tightly interdependent
forms of the old elitist patterns United States is witnessing counts as climate finance on the society encounters acute or
of collective behaviour recently a correlation of historical world’s behalf risks introducing chronic stressors beyond its
challenged by anti-racism. Any ‘redlining’ — the systematic questionable accounting expectations or operating
association of science with denial of services to residents practices (see R. Weikmans and capabilities, such highly
political archetypes could turn of certain areas, on the basis J. T. Roberts Clim. Dev. 11, 97–111; efficient systems are prone to
some against it by enhancing of race or ethnicity — with 2019). The OECD, for example, catastrophic failure that can
the view that it is an exclusive COVID‑19 incidence today. continues to account loans delay or prevent recovery.
pursuit. It is crucial that land-use at face value, which equates a More-resilient systems
practices are considered $10‑million loan (which has to might be less efficient, but they
Andrew Isaac Meso King’s when making public-health be paid back) to a $10‑million recover better from systemic
College London, UK. management decisions. This grant. It is therefore no surprise disruptions. Building resilience
andrew.meso@kcl.ac.uk could help to mitigate the multi- that developing countries does not mean abandoning
generational, compounding have found OECD reports efficiency, but rather maximizing
impacts of isolated or confined unacceptable before (see Nature socio-economic systems’ long-
residential spaces. Those who 573, 328–331; 2019). term sustainability in the face
live in such areas will continue of future disruptions. Marrying
to take a disproportionate hit Romain Weikmans Free University resilience with efficiency would
unless land-use equity is made a of Brussels, Belgium. allow society to preserve or
priority in governance. romain.weikmans@ulb.be even improve living standards in
current and future crises.
Cesunica Ivey University of J. Timmons Roberts Brown
California, Riverside, USA. University, Providence, Rhode Benjamin D. Trump, Igor Linkov
cesunica@ucr.edu Island, USA. US Army Corps of Engineers,
Boston, Massachusetts, USA.
Stacy-ann Robinson Colby igor.linkov@usace.army.mil
College, Waterville, Maine, USA.
William Hynes OECD, Paris,
France.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Expert insight into current research
News & views

predatory dinosaurs6. Despite the present-day
catalogue of approximately 200 Mesozoic bird
Palaeontology
species from around the world7, ranging in age
The changing face of birds

from about 150 million to 66 million years old,
none has a skull resembling anything like that
of Falcatakely. Its discovery reveals a skull
from the age of dinosaurs shape previously unknown for any bird from
the age of the dinosaurs.
The exceptional degree of preservation of
Falcatakely enabled the authors to make other
Daniel J. Field
astonishing findings. Imaging using a method
The fossil record traces the origin of the modern bird skull called high-resolution microc omputed
as birds evolved from their dinosaurian ancestors. Now the tomography enabled them to digitally ‘extract’
the fragile skull bones from the surrounding
discovery of a bizarre fossil reveals a surprising diversion rock. O’Connor and colleagues could then
during this process of facial transformation. See p.272 reassemble the delicate components of the
bill, including elements such as the paper-
thin palate bones, which are rarely found
As living dinosaurs, birds are the product of along its jaws. By contrast, the closest relatives preserved, into a compelling 3D model (see
a long and complex evolutionary history that of modern birds from the time of the dinosaurs Supplementary Videos 1–8 of ref. 3).
has given rise to more than 11,000 living spe- show the opposite pattern, with teeth found Studying the palate, the authors spotted a
cies1. The past decade has witnessed a surge throughout the jaws, but none at the tip of surprising bone called the ectopterygoid. This
of interest in the evolution of the avian skull the beak (Fig. 1)4. These features give the skull is absent in living birds, but is a component of
— a structure that is hugely variable across the of Falcatakely an almost comical profile — the palate of non-avian dinosaurs and early
diversity of living birds2. However, our abil- imagine a creature resembling a tiny, buck- bird-like forms, such as the iconic early birds
ity to test hypotheses of how and when key toothed toucan flitting from branch to branch, Archaeopteryx and Sapeornis8. However, on the
transformations of the bird skull took place occasionally glancing down at Madagascar’s basis of detailed analyses, O’Connor et al. infer
is limited if we can’t incorporate fossils into formidable Late Cretaceous inhabitants, which that Falcatakely belongs to a group of Meso-
evolutionary models. On page 272, O’Connor included equally bizarre mammals5 and giant zoic ‘pre-modern’ birds called Enantiornithes
et al.3 report a stunning fossil-bird discovery
from the age of the dinosaurs that reminds us
of the crucial value of fossils for casting light Falcatakely Ichthyornis Asteriornis
on unexpected complexities in avian evolu- Lacrimal Nasal
tionary history.
This striking addition to the aviary of
the Mesozoic era is between 72 million and
66 million years old (corresponding to the
latest stage of the Cretaceous period). It Maxilla Premaxilla
(upper jaw) (upper beak)
comes from Madagascar, and is named
Falcatakely forsterae, which roughly translates Towards
as Forster’s small scythe beak. The name refer- living birds
ences the distinctive shape of the fossil’s bill
and honours Catherine Forster’s numerous
contributions to vertebrate palaeontology in
Madagascar. The specimen is small (less than
9 centimetres long) and delicate (paper thin
in places), yet the stunning bone preservation
provides a spectacular look at this ancient
creature’s anatomy. Figure 1 | The evolution of ancient bird skulls. Discoveries of bird skulls from the Mesozoic era (the age of
Although the fossil consists of only the the dinosaurs) have revealed both how the skull of modern birds arose and the surprising variability of these
ancient skulls (as illustrated by these fossils, reported between 2018 and 2020). O’Connor et al.3 present
front half of a skull, it’s clear that Falcatakely
their discovery of the skull of a bird specimen they name Falcatakely forsterae, which shows an unusually
is more than just a pretty face. The skull is
deep and elongated snout, with teeth (at least one tooth and possibly more) positioned only at the very tip
utterly bizarre, characterized by a deep and
of the upper jaw in a skull region called the premaxilla. Like other distant relatives of modern birds, such
elongated snout (Fig. 1) unlike those seen in any as non-avian dinosaurs, the upper jaw of Falcatakely consists mainly of a region called the maxilla. Closer
other Mesozoic birds. The skull’s architecture relatives of modern birds, such as Ichthyornis4, had teeth throughout the jaws, except at the tip, and retained
becomes even weirder. The very tip of its snout the ancestrally large maxilla. Early modern birds, including Asteriornis (an ancient relative of chickens and
has one small preserved tooth (the tip possibly ducks)13, lost their teeth completely, and had upper jaws dominated by the premaxilla. Nasal bones are
had more teeth that were not preserved); how- shown in grey and lacrimal bones (inferred for Asteriornis) are in beige. (Figure adapted from Fig. 2 of ref. 3,
ever, there are clearly no teeth anywhere else Fig. 3 of ref. 4 and Fig. 1 of ref. 13.)

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News & views
(a name that means ‘opposite birds’, in refer- Daniel J. Field is in the Department of Xing, X.) 37–95 (Bull. Am. Mus. Natl Hist. No. 440;
ence to their atypical shoulder-joint articula- Earth Sciences, University of Cambridge, Am. Mus. Natl Hist., 2020).
8. Hu, H. et al. Proc. Natl Acad. Sci. USA 116, 19571–19578
tions), which occupy a branch of the dinosaur Cambridge CB2 3EQ, UK. (2019).
family tree that is much closer to that of mod- e-mail: djf70@cam.ac.uk 9. Elzanowski, A. Cour. Forschungsinst. Senckenb. 181,
ern birds than the branches occupied by either 37–53 (1995).
10. Mayr, G. Avian Evolution: The Fossil Record of Birds and its
Archaeopteryx or Sapeornis. The presence of Paleobiological Significance (Wiley, 2016).
an ectopterygoid in Enantiornithes has been 1. del Hoyo, J. All the Birds of the World (Lynx, 2020). 11. O’Connor, P. M. & Forster, C. A. J. Vert. Paleontol. 30,
2. Felice, R. N. & Goswami, A. Proc. Natl Acad. Sci. USA 115, 1178–1201 (2010).
suggested previously9, but this identification
555–560 (2018). 12. Li, Y., Ruta, M. & Wills, M. A. Syst. Biol. 69, 638–659 (2020).
has been questioned10. Thus, the detection of 3. O’Connor, P. M. et al. Nature 588, 272–276 (2020). 13. Field, D. J., Benito, J., Chen, A., Jagt, J. W. M. &
an ectopterygoid in Falcatakely either shows 4. Field, D. J. et al. Nature 557, 96–100 (2018). Ksepka, D. T. Nature 579, 397–401 (2020).
5. Krause, D. W. et al. Nature 515, 512–517 (2014). 14. Longrich, N. R., Tokaryk, T. & Field, D. J. Proc. Natl Acad.
that this ancestral component of the palate was
6. Lavocat, R. Bull. Mus. Natl Hist. Nat. 27, 256–259 (1955). Sci. USA 108, 15253–15257 (2011).
indeed retained in Enantiornithes (at a rela- 7. Pittman, M. et al. in Pennaraptoran Theropod Dinosaurs:
tively late stage in avian evolutionary history), Past Progress and New Frontiers (eds Pittman, M. & This article was published online on 25 November 2020.
or challenges the identification of Falcatakely
as a member of Enantiornithes, suggesting
Particle physics
instead that it belongs on a deeper branch of
How protons interact

the family tree of Mesozoic birds.
Although it is impossible to decide defin-
itively between these two options without
access to further fossil material, O’Connor
et al. grapple with this uncertainty to an with their exotic siblings
impressively thorough degree, showing that
Falcatakely nests with Enantiornithes in evo-
lutionary trees constructed under a range of
Manuel Lorenz
alternative analytical approaches. Moreover, The nuclear forces that act on short-lived subatomic particles
the identification of Falcatakely as a member have been hard to study. This problem has now been solved by
of Enantiornithes makes sense in light of the
previous identification of fragmentary bones
smashing high-energy protons together and measuring the
assigned to Enantiornithes from the same momenta of the unstable particles produced. See p.232
Madagascan fossil locality11. Nonetheless,
some research has indicated that family-tree
reconstructions of dinosaurs can return con- On page 232, the ALICE Collaboration 1 consist of three quarks, at least one of which
flicting results when skulls, instead of com- reports that data from high-energy collisions must be a type (flavour) known as a strange
plete skeletons, are analysed12. This lack of between protons can be used to investigate the quark; the other quarks can be up or down,
certainty is all the more reason for the team little-understood nuclear forces between pro- the two lightest quark flavours. Hyperons are
to continue its productive fieldwork in the tons and subatomic particles called hyperons. not present in the everyday matter that sur-
hope of discovering more-complete material. The measurements have comparable precision rounds us on Earth, but — depending on their
Modern birds originated in the Late Creta- to state-of-the-art numerical calculations of interactions with nucleons — might affect the
ceous13, and it has become increasingly appar- the forces, thereby allowing conclusive quanti- compressibility of nuclear matter at high den-
ent that the final 20 million years of the age of tative comparisons of experimental data with sities. This means they could be relevant to the
the dinosaurs (86 million to 66 million years theory. Accurate knowledge of these forces stability of neutron stars4. Precise knowledge
ago) was a pivotal time in avian evolutionary is needed for various aspects of physics of hyperon–nucleon interactions is therefore
history. The discovery of Falcatakely shows research, for example in efforts to understand of great importance not only for nuclear phys-
us that the importance of this window in time the stability of neutron stars. ics, but also for astrophysics. However, meas-
for bird evolution extends well beyond the ori- The nuclear force between neutrons and urements of these interactions are difficult
gin of modern birds. Apparently, ‘pre-modern’ protons (which are known collectively as to make in conventional experiments involv-
bird lineages such as Enantiornithes were nucleons) is a residual effect of the strong ing direct particle collisions in accelerators,
still experimenting with bold new forms — interaction that acts between their ele- because hyperons are short-lived (their life-
and possibly previously unfilled ecological mentary constituents (quarks and gluons). times are about 10−10 s; ref. 5) and fly only a few
niches — well into the terminal stages of the First-principles calculations of the nuclear centimetres, on average, before they decay.
Cretaceous. force have been challenging because of the The ALICE Collaboration now reports that
The pre-modern birds were wiped out in the peculiarities of the strong interaction. Our proton–hyperon interactions can be investi-
end-Cretaceous mass extinction event, along knowledge of this force is, therefore, based gated using high-energy collisions between
with all other dinosaurs, apart from modern largely on simplified models and theories2, protons carried out at the Large Hadron Col-
birds14. Considering the impressive diversity guided by experimental data3. The strong lider (LHC) at CERN, Europe’s particle phys-
and global distribution of Enantiornithes in interaction between hadrons (subatomic par- ics laboratory near Geneva, Switzerland. The
the Late Cretaceous, determining why they ticles, such as nucleons, that consist of two technique depends on measurements of corre-
disappeared in that mass extinction, whereas or more quarks bound together by the strong lations between the momenta of protons and
the earliest modern-bird lineages survived, interaction) at low energies is therefore often hyperons produced in the collisions.
remains one of the greatest mysteries in referred to as the final frontier of the standard The process studied in the experiments
avian evolutionary history. The answers to model of particle physics. involves three steps (Fig. 1). First, protons are
such questions, much like the unexpected The interaction between nucleons has been collided at extremely high energies, taking
anatomy of creatures such as Falcatakely, can measured with high accuracy3, but the inter- advantage of the fact that the LHC produces
be revealed only by evidence from the fossil action of nucleons with their heavier siblings, higher collision energies than any other
record. So, let’s keep digging. the hyperons, is less well assessed. Hyperons accelerator. Second, hadrons are emitted

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
LHC are expected to go into full operation in
a b c the coming years, including NICA in Russia,
J-PARC in Japan and FAIR in Germany. Although
fewer proton–hyperon pairs are generated per
Particle Interactions Detector collision in lower-energy collisions, a greater
source
Proton proportion of those pairs will be emitted at low
momenta — which might turn out to be advan-
Hyperon
tageous, because more data are needed to
reduce the statistical errors in measurements
Figure 1 | Investigating the proton–hyperon interaction. a, The ALICE Collaboration1 smashed together of low-momentum systems. Increases in com-
high-energy protons in CERN’s Large Hadron Collider. b, The collisions generate a ‘particle source’ — a puting power should also substantially reduce
volume of space in which components of the colliding protons interact and become confined within new the uncertainties of first-principles calcula-
particles. These new particles are emitted from the source, and include protons that pair up with heavier tions of nuclear forces. Taken together, these
particles known as hyperons. c, The paired-up protons and hyperons interact with each other in a way that developments bode well for future research
alters the relative momentum of the system, which is then measured by a detector. These measurements are
into the final frontier of the standard model
then used to determine the nuclear force between the proton–hyperon pair.
of particle physics.
by a ‘source’ produced by the collision — a be calculated from first principles so that the Manuel Lorenz is at the Institute for Nuclear
volume of space in which quarks and gluons results can be compared with experimental Physics, Goethe University, Frankfurt 60438,
that originally came from the protons interact findings. The precision with which nucleon– Germany.
and become confined within new hadrons. The nucleon interactions can be determined from e-mail: m.lorenz@gsi.de
source emits various types of hadron, includ- experimental data is still superior to that
ing protons and hyperons, some of which form obtained from these calculations, but the 1. ALICE Collaboration. Nature 588, 232–238 (2020).
proton–hyperon pairs. Finally, the proton and ALICE Collaboration’s measurements of the 2. Epelbaum, E., Hammer, H.-W. & Meissner, U.-G. Rev. Mod.
Phys. 81, 1773–1825 (2009).
hyperon in each of these pairs interact with proton–hyperon interactions almost exactly 3. Stoks, V. & de Swart, J. Phys. Rev. C 47, 761–767 (1993).
each other in ways that alter the momentum match those obtained from theory. 4. Weissenborn, S., Chatterjee, D. & Schaffner-Bielich, J.
of the paired system. This momentum is meas- A wealth of high-precision measurements of Phys. Rev. C 85, 065802 (2012).
5. Tanabashi, M. et al. Phys. Rev. D 98, 030001 (2018).
ured by a detector and used to determine the proton–hyperon interactions is expected from 6. Lisa, M. A., Pratt, S., Soltz, R. & Wiedemann, U. Annu. Rev.
momentum correlations. the LHC in the next decade, following on from Nucl. Part. Sci. 55, 357–402 (2005).
The momentum correlations reflect the size its recent upgrade. Moreover, various other 7. Adamczewski-Musch, J. et al. Phys. Rev. C 94, 025201 (2016).
8. Acharya, S. et al. Phys. Rev. C 99, 024001 (2019).
of the hadron source and the properties of the facilities that will study particle collisions at 9. Sasaki, K. et al. Nucl. Phys. A 998, 121737 (2020).
interaction between the produced proton– lower energies than those produced at the 10. Iritani, T. et al. Phys. Lett. B 792, 284–289 (2019).
hyperon pairs. Such correlation analyses were
originally used to determine the source size in
Virology
collisions of heavy ions6, but in the new work,
Cracking the cell access

they are instead used to investigate the inter-
action between the particles of interest. This
approach to studying particle interactions was
pioneered by the HADES Collaboration7 at the
GSI Helmholtz Centre for Heavy Ion Research code for a deadly virus
in Darmstadt, Germany, and was further devel-
oped by the ALICE collaboration8 at the LHC.
James Zengel & Jan E. Carette
The current work depends on the fact that the
extremely high-energy proton–proton colli- The discovery that the receptor protein LDLRAD3 is essential
sions carried out at the LHC produce a high for infection of human cells by Venezuelan equine encephalitis
abundance of hyperons from small-volume
hadron sources. The authors used this method
virus could inform strategies to combat this potentially
to measure the strong force between pro- lethal infection. See p.308
tons and Ω– hyperons (which consist of three
strange quarks) and between protons and Ξ
hyperons (which consist of two strange quarks When viruses jump from animals to humans, membrane and initiate its replication cycle. On
and one up or down quark). disease outbreaks can follow. A striking exam- page 308, Ma et al.3 describe the long-sought
The ALICE Collaboration’s findings open ple is Venezuelan equine encephalitis virus receptor for VEEV, and show that it is essential
up a new ‘laboratory’ for investigating other (VEEV). This virus causes sporadic disease for viral replication in both human cells and
nucleon–hyperon interactions, including the outbreaks in horses in Latin America that fre- mouse models.
little-explored interactions with hyperons that quently spill over into humans, resulting in Interactions between a virus and its host
contain two or three strange quarks. This will often-deadly neurological disease1. Because receptor protein can control which tissue
aid our understanding of metastable states of its pathogenicity in livestock and humans, types in the body support viral growth, thus
of hyperon pairs or of the compressibility of VEEV has been studied as a biological weapon influencing the type of disease that results.
nuclear matter at high densities. The latter is by several countries, including the United Furthermore, these interactions can deter-
relevant not only for the stability of neutron States2. Treatments for the disease are there- mine how well the virus spreads through a host
stars, but also for neutron-star mergers and fore highly desirable. It has been unknown how population. During the continuing SARS-CoV-2
heavy-ion collisions. VEEV co-opts cellular pathways to establish pandemic, for instance, viral strains that had a
In a lucky coincidence, recent developments infection in people — in particular, which host specific mutation in the virus’s spike protein
in theoretical physics9,10 allow nuclear forces to receptor protein allows VEEV to cross the cell became predominant soon after the virus

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
LHC are expected to go into full operation in
a b c the coming years, including NICA in Russia,
J-PARC in Japan and FAIR in Germany. Although
fewer proton–hyperon pairs are generated per
Particle Interactions Detector collision in lower-energy collisions, a greater
source
Proton proportion of those pairs will be emitted at low
momenta — which might turn out to be advan-
Hyperon
tageous, because more data are needed to
reduce the statistical errors in measurements
Figure 1 | Investigating the proton–hyperon interaction. a, The ALICE Collaboration1 smashed together of low-momentum systems. Increases in com-
high-energy protons in CERN’s Large Hadron Collider. b, The collisions generate a ‘particle source’ — a puting power should also substantially reduce
volume of space in which components of the colliding protons interact and become confined within new the uncertainties of first-principles calcula-
particles. These new particles are emitted from the source, and include protons that pair up with heavier tions of nuclear forces. Taken together, these
particles known as hyperons. c, The paired-up protons and hyperons interact with each other in a way that developments bode well for future research
alters the relative momentum of the system, which is then measured by a detector. These measurements are
into the final frontier of the standard model
then used to determine the nuclear force between the proton–hyperon pair.
of particle physics.
by a ‘source’ produced by the collision — a be calculated from first principles so that the Manuel Lorenz is at the Institute for Nuclear
volume of space in which quarks and gluons results can be compared with experimental Physics, Goethe University, Frankfurt 60438,
that originally came from the protons interact findings. The precision with which nucleon– Germany.
and become confined within new hadrons. The nucleon interactions can be determined from e-mail: m.lorenz@gsi.de
source emits various types of hadron, includ- experimental data is still superior to that
ing protons and hyperons, some of which form obtained from these calculations, but the 1. ALICE Collaboration. Nature 588, 232–238 (2020).
proton–hyperon pairs. Finally, the proton and ALICE Collaboration’s measurements of the 2. Epelbaum, E., Hammer, H.-W. & Meissner, U.-G. Rev. Mod.
Phys. 81, 1773–1825 (2009).
hyperon in each of these pairs interact with proton–hyperon interactions almost exactly 3. Stoks, V. & de Swart, J. Phys. Rev. C 47, 761–767 (1993).
each other in ways that alter the momentum match those obtained from theory. 4. Weissenborn, S., Chatterjee, D. & Schaffner-Bielich, J.
of the paired system. This momentum is meas- A wealth of high-precision measurements of Phys. Rev. C 85, 065802 (2012).
5. Tanabashi, M. et al. Phys. Rev. D 98, 030001 (2018).
ured by a detector and used to determine the proton–hyperon interactions is expected from 6. Lisa, M. A., Pratt, S., Soltz, R. & Wiedemann, U. Annu. Rev.
momentum correlations. the LHC in the next decade, following on from Nucl. Part. Sci. 55, 357–402 (2005).
The momentum correlations reflect the size its recent upgrade. Moreover, various other 7. Adamczewski-Musch, J. et al. Phys. Rev. C 94, 025201 (2016).
8. Acharya, S. et al. Phys. Rev. C 99, 024001 (2019).
of the hadron source and the properties of the facilities that will study particle collisions at 9. Sasaki, K. et al. Nucl. Phys. A 998, 121737 (2020).
interaction between the produced proton– lower energies than those produced at the 10. Iritani, T. et al. Phys. Lett. B 792, 284–289 (2019).
hyperon pairs. Such correlation analyses were
originally used to determine the source size in
Virology
collisions of heavy ions6, but in the new work,
Cracking the cell access

they are instead used to investigate the inter-
action between the particles of interest. This
approach to studying particle interactions was
pioneered by the HADES Collaboration7 at the
GSI Helmholtz Centre for Heavy Ion Research code for a deadly virus
in Darmstadt, Germany, and was further devel-
oped by the ALICE collaboration8 at the LHC.
James Zengel & Jan E. Carette
The current work depends on the fact that the
extremely high-energy proton–proton colli- The discovery that the receptor protein LDLRAD3 is essential
sions carried out at the LHC produce a high for infection of human cells by Venezuelan equine encephalitis
abundance of hyperons from small-volume
hadron sources. The authors used this method
virus could inform strategies to combat this potentially
to measure the strong force between pro- lethal infection. See p.308
tons and Ω– hyperons (which consist of three
strange quarks) and between protons and Ξ
hyperons (which consist of two strange quarks When viruses jump from animals to humans, membrane and initiate its replication cycle. On
and one up or down quark). disease outbreaks can follow. A striking exam- page 308, Ma et al.3 describe the long-sought
The ALICE Collaboration’s findings open ple is Venezuelan equine encephalitis virus receptor for VEEV, and show that it is essential
up a new ‘laboratory’ for investigating other (VEEV). This virus causes sporadic disease for viral replication in both human cells and
nucleon–hyperon interactions, including the outbreaks in horses in Latin America that fre- mouse models.
little-explored interactions with hyperons that quently spill over into humans, resulting in Interactions between a virus and its host
contain two or three strange quarks. This will often-deadly neurological disease1. Because receptor protein can control which tissue
aid our understanding of metastable states of its pathogenicity in livestock and humans, types in the body support viral growth, thus
of hyperon pairs or of the compressibility of VEEV has been studied as a biological weapon influencing the type of disease that results.
nuclear matter at high densities. The latter is by several countries, including the United Furthermore, these interactions can deter-
relevant not only for the stability of neutron States2. Treatments for the disease are there- mine how well the virus spreads through a host
stars, but also for neutron-star mergers and fore highly desirable. It has been unknown how population. During the continuing SARS-CoV-2
heavy-ion collisions. VEEV co-opts cellular pathways to establish pandemic, for instance, viral strains that had a
In a lucky coincidence, recent developments infection in people — in particular, which host specific mutation in the virus’s spike protein
in theoretical physics9,10 allow nuclear forces to receptor protein allows VEEV to cross the cell became predominant soon after the virus

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News & views
key in initiating viral spread and fuelling viral
a VEEV b infection in the blood after a mosquito bite?
Does VEEV reach the central nervous system
D1 D1 through the peripheral nervous system, or
Antiviral
decoy more directly by crossing the blood–brain
LDLRAD3 barrier7? And do these steps require LDLRAD3?
LDLRAD3 mediates cell entry of VEEV, but
Cell membrane
Ma and colleagues found that it does not con-
trol entry of related encephalitic alphaviruses
Cytoplasm such as Western and Eastern equine enceph-
Severe disease Few signs of disease alitis viruses. This is somewhat unexpected,
Death High survival rate given the strong similarities in structure8 and
pathogenesis between the three. An intriguing
Figure 1 | Preventing infection by Venezuelan equine encephalitis virus (VEEV) in mice. a, Ma et al.3 possibility is that other members of the LDL
report that LDLRAD3 is the mammalian receptor protein for VEEV. Entry of VEEV into cells is mediated scavenger receptor family act as receptors
by binding to LDLRAD3’s domain 1 (D1). VEEV infection in mice causes severe disease and death in all cases. for distinct encephalitic alphaviruses. Further
b, The authors fused D1 to part of an antibody. This construct acts as an antiviral decoy, binding to VEEV and structural studies defining the LDLRAD3–
so preventing it from interacting with LDLRAD3. The decoy treatment protected mice from VEEV infection — VEEV interface will provide clues to why this
the animals showed few signs of disease and had a much higher survival rate than did untreated animals.
interaction is seemingly so specific.
What are the therapeutic implications of Ma
jumped to humans — this mutation enhances But, intriguingly, deletion of the intracellular and colleagues’ work? Precise characterization
binding between the spike protein and its domain of LDLRAD3 — which typically medi- of the LDLRAD3-binding site on VEEV could
receptor on human cells, ACE2 (ref. 4). ates endocytosis in this receptor family — did aid the development of highly neutralizing
Despite the importance of host receptors not prevent VEEV entry. This could mean that antibodies that block the VEEV–LDLRAD3
for understanding infection, their identities binding of VEEV to LDLRAD3 triggers fusion interaction. A similar antibody therapy that
for VEEV and other alphaviruses (a category of the viral and cell membranes, resulting prevents interactions between the Ebola virus
of mainly mosquito-borne RNA viruses) have in direct release of viral RNA into the cell. and its human receptor protein NPC1 has
mostly been elusive . Alphaviruses that infect Alternatively, LDLRAD3 might mainly medi- shown success, reducing the mortality from
humans can cause either severe arthritis or ate virus binding, with another, unknown Ebola9,10. Another strategy is to use soluble
— as VEEV does — inflammation of the brain factor controlling endocytosis. Future stud- LDLRAD3 as an antiviral decoy. The authors
(encephalitis). In 2018, previous work5 from ies are needed to distinguish between these have provided strong proof of principle in
some of the authors of the current study possibilities. mice that this might work, although further
uncovered Mxra8 as a mammalian receptor Finally, the authors investigated whether optimization to enhance the binding affin-
protein for multiple arthritogenic alpha- modulating LDLRAD3 could protect mice from ity and half-life of soluble LDLRAD3 in vivo
viruses, but not for encephalitis-causing VEEV infection. Strikingly, deletion of Ldlrad3 might be required, equivalent to developing
alphaviruses. completely protected the animals from other engineered ACE2 that has a greatly enhanced
Ma et al. therefore went in search of the wise-lethal infection with highly pathogenic potency in blocking SARS-CoV-2 infection11.
mammalian receptor for VEEV. The authors VEEV strains (Fig. 1b). The authors gave wild- The discovery of LDLRAD3 has therefore
made use of a gene-editing tool called type mice a soluble form of LDLRAD3 in which revealed a range of ways in which we might,
CRISPR–Cas9 to introduce mutations into D1 of the receptor was fused to part of an anti- in the future, combat severe VEEV disease.
more than 20,000 genes in mouse neuronal body. The construct binds to VEEV, preventing
cells. They then screened the cells to deter- interactions with LDLRAD3 on cells. Admin- James Zengel and Jan E. Carette are in the
mine which mutations prevented infection istration of the soluble construct before or Department of Microbiology and Immunology,
by a modified form of VEEV (the version used after infection with VEEV led to near-complete Stanford University School of Medicine,
was less pathogenic than normal, to enable protection in wild-type mice. Stanford, California 94305, USA.
safe experimentation in the laboratory). The A key question is whether LDLRAD3 mainly e-mail: carette@stanford.edu
screen revealed that Ldlrad3 was the gene most mediates infection in the brain, where it causes
commonly mutated in infection-resistant cells. encephalitis, or whether it also takes part in
Subsequent experiments in a broad range of VEEV infection in the different cell types
human and mouse cell types demonstrated involved in spreading the virus through the
that the LDLRAD3 protein is essential for VEEV body after an initial mosquito bite. The broad
entry into host cells. expression pattern of LDLRAD3 in many tis- 1. Weaver, S. C., Ferro, C., Barrera, R., Boshell, J. &
LDLRAD3 is a poorly characterized mem- sues suggests that the protein has roles in Navarro, J.-C. Annu. Rev. Entomol. 49, 141–174 (2004).
ber of a large group of membrane-bound viral spread throughout the body. Indeed, the 2. Bronze, M. S., Huycke, M. M., Machado, L. J.,
Voskuhl, G. W. & Greenfield, R. A. Am. J. Med. Sci. 323,
receptors called the LDL scavenger receptor authors show that soluble LDLRAD3 almost 316–325 (2002).
family. This family is mainly known for its role totally blocked virus replication in several tis- 3. Ma, H. et al. Nature 588, 308–314 (2020).
in bringing lipoprotein particles into the cell in sues that are involved in such spread, including 4. Yurkovetskiy, L. et al. Cell 183, 739–751 (2020).
5. Zhang, R. et al. Nature 557, 570–574 (2018).
vesicles (a process called endocytosis). Other the blood serum, spleen and brain. 6. Hofer, F. et al. Proc. Natl Acad. Sci. USA 91, 1839–1842
members of the family have been shown to be In the future, deletion of Ldlrad3 in specific (1994).
co-opted by viruses unrelated to alphaviruses mouse tissues could help to reveal more about 7. Charles, P. C., Walters, E., Margolis, F. & Johnston, R. E.
Virology 208, 662–671 (1995).
to gain entry into the cell6. how VEEV spreads and causes disease. By abol- 8. Hasan, S. S. et al. Cell Rep. 25, 3136–3147 (2018).
Ma and colleagues identified a specific ishing infection in chosen tissues in this way, 9. Mulangu, S. et al. N. Engl. J. Med. 381, 2293–2303 (2019).
region called domain 1 (D1) in the extracellular one could rigorously test various unknown 10. Misasi, J. et al. Science 351, 1343–1346 (2016).
11. Chan, K. K. et al. Science 369, 1261–1265 (2020).
portion of LDLRAD3 through which VEEV-like aspects of disease progression. Are blood
particles directly bind to the receptor (Fig. 1a). cells (specifically, a type called myeloid cells) This article was published online on 18 November 2020.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
widely variable carbon prices across regions
to ensure fair effort-sharing. This avoids trans-
Climate science
fers, but is less economically efficient globally.
Trade-offs for equitable

The authors then explored a series of hybrid
scenarios in between these two extremes,
changing the degree of variation in regional
climate policy assessed carbon prices and calculating the international

transfer required to achieve fair effort-sharing
in each case. By combining and comparing
these scenarios, the authors plot a curve that
Wei Peng
depicts the trade-off between global cost and
Computational models show that regionally varied prices financial transfers (see Fig. 3a of the paper1).
for carbon emissions can greatly reduce the need for poor Bauer et al. show that small deviations from
a globally uniform carbon price can achieve
countries to receive financial assistance to tackle climate the 2 °C goal with slightly higher mitigation
change, while still stabilizing global warming. See p.261 costs, but much lower transfers. For instance,
a modest regional variation in carbon prices
with a standard deviation of US$14 per tonne
International agreements for tackling climate countries pay a uniform carbon price, because in 2030 leads to a negligible increase in
change face many challenges. One of the emissions might not be reduced in the places global mitigation costs, but an 18% reduction
knottiest is how to allocate mitigation efforts in which it is cheapest to do this. The benefits in required transfers. This implies that it is
fairly across nations, without increasing the of the Paris approach for feasibility, equity highly possible to deliver reasonably good
overall global cost, and without asking poor and sovereignty therefore come at the cost outcomes for equity, economic efficiency
countries to accept large amounts of financial of economic inefficiencies. and sovereignty, if policymakers are willing to
assistance that raise concerns about infringe- Bauer et al. quantify the trade-offs between deviate modestly from the economically most
ments of national sovereignty. On page 261, the global cost of climate mitigation (low cost efficient strategy (a uniform carbon price).
Bauer et al.1 report an analysis of the trade- equates to high economic efficiency) and the The authors’ findings highlight an advantage
off between cost and sovereignty for various amount of international financial transfer of the Paris approach: an equitable outcome
international climate policies. They conclude needed (a proxy for concerns about sover- can still be achieved when international trans-
that sovereignty concerns can be allayed sub- eignty), to find a balance that would enable fers are reduced. At the 2009 United Nations
stantially with only slightly higher global costs the 2 °C goal to be achieved with equitable Climate Change Conference, wealthy countries
by using a strategy in which the carbon price effort-sharing — ensuring the same ratio of pledged $100 billion a year by 2020 to help
— the charge per tonne of carbon dioxide emis- mitigation cost to income for all countries. developing countries tackle climate change.
sions — is varied modestly to account for each The authors started by analysing two extreme But the total amount collected from public and
country’s ability to pay. policies. The first involves setting a globally private sources in 2018 was less than $80 billion
After decades of gridlock in climate diplo- uniform carbon price. This is the cheapest (go.nature.com/39gnts7). US President Donald
macy, the 2015 Paris agreement received mitigation strategy overall, but requires Trump’s decision to withdraw $2 billion that
support from nearly 200 countries for the large international transfer to ensure fair had been promised to the Green Climate Fund
collective goal of limiting global warming by effort-sharing. The second policy is to have — the largest international fund for financing
the end of this century to well below 2 °C above
pre-industrial levels. The key to its success
was that it provided much-needed flexibility
for countries to make their own nationally
determined contributions to reducing green-
house-gas emissions, instead of setting tar-
gets through a centralized treaty, as was the
case for the 1997 Kyoto Protocol (go.nature.
com/3oa6uvl).
By allowing countries to tailor commit-
ments to what can realistically be deliv-
ered, the Paris approach reflects the United
Nations’ equity principle of ‘common but
differentiated responsibilities and respec-
tive capabilities’. It also reduces the need for
financial transfer — the provision of money
GCF/ANGELI MENDOZA
for poor countries to help them tackle cli-

mate change and to compensate for the neg-
ative economic consequences of mitigating
global warming. Reducing this need is polit-
ically beneficial for poor countries because
they sometimes perceive a heavy reliance Figure 1 | Panel inspection at the Sumber Solar Plant in Mongolia. The construction of this solar plant
on international financial transfer as under- was supported by the Green Climate Fund, the largest international fund for financing efforts to tackle
mining their national sovereignty. However, global warming. However, a heavy reliance on such support is sometimes seen by poor countries as an
the Paris approach could be more expensive infringement of their national sovereignty. Bauer et al.1 report an analysis of climate policies that suggests a
overall than the idealized strategy in which all way to reduce the need for financial support.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
News & views
efforts to mitigate and adapt to climate change REMIND-MAgPIE computational model, which model. Instead, the authors use a rescaling
(Fig. 1) — contributed to the shortfall. These is one of the ‘integrated assessment models’ method to adjust the whole set of regional
realities demonstrate the uncertainties in (IAMs) used by the Intergovernmental Panel carbon prices, to make computation easier.
mobilizing large-scale international transfer. on Climate Change to explore how different Finally, international transfer is modelled
By reducing the need for transfer, the Paris policy and technology pathways might affect as the total financial flow from rich to poor
approach creates a more collaborative space global emissions and future climate3,4. Such countries, without taking into account specific
for engaging both developed and developing models include representations of economic, financing mechanisms. However, a wide range
countries in climate diplomacy. energy and land systems, as well as the inter- of financing mechanisms exist, such as grants,
Bauer and colleagues also point out plau- actions between them. IAMs have been impor- subsidized loans and private finance, which
sible unintended consequences of the Paris tant analytical tools for designing climate involve different actors and terms. These limi-
approach for environmental sustainability. If policies5, but Bauer and colleagues’ study is tations do not overshadow the value of the new
the variation in carbon price across countries exemplary in that an IAM is used to evaluate work, but future efforts should engage model
is large and the 2 °C target remains the global competing considerations faced by climate developers and users to bring IAMs closer to
objective, then developing countries might policymakers. real-world problems.
make only limited efforts to tackle climate The new findings are a useful contribution
change. This could push developed coun- to the high-level debate about the modes Wei Peng is in the School of International
tries to adopt costly options, such as using and objectives of climate policy, but crucial Affairs and the Department of Civil and
a technology called bioenergy with carbon aspects of the modelling framework lack Environmental Engineering, Pennsylvania
capture and storage (BECCS). The authors the granularity needed to inform real-world State University, University Park, Pennsylvania
find that, in this scenario, countries that do decision-making. For instance, the model 16802, USA.
not belong to the Organisation for Economic considers just 12 world regions, whereas e-mail: weipeng@psu.edu
Co-operation and Development (OECD) will climate decisions are often made at national
export bioenergy to OECD countries for use and subnational levels. Bauer et al. also use 1. Bauer, N. et al. Nature 588, 261–266 (2020).
2. Heck, V., Gerten, D., Lucht, W. & Popp, A. Nature Clim.
in BECCS — exacerbating deforestation and carbon prices as a proxy for climate policy, Change 8, 151–155 (2018).
land-use intensification in the global south. whereas policymakers need to choose from a 3. Clarke, L. et al. in Climate Change 2014: Mitigation of
The negative effects of BECCS on sustainabil- range of low-carbon policies that vary in cost Climate Change. Contribution of Working Group III to
the Fifth Assessment Report of the Intergovernmental
ity have received wide attention2, but Bauer and feasibility. Panel on Climate Change (eds Edenhofer, O. et al.) Ch. 6,
and co-workers’ study emphasizes that these Furthermore, countries decide their own 413–510 (Cambridge Univ. Press, 2014).
effects could worsen under the Paris approach, carbon prices in the real world, whereas 4. Hoegh-Guldberg, O. et al. in Global Warming of 1.5°C
(eds Masson-Delmotte, V. et al.) Ch. 3, 175–311 (IPCC, 2018).
if mitigation efforts shift across countries. regional carbon prices are not allowed to 5. van Beek, L., Hajer, M., Pelzer, P., van Vuuren, D. &
The authors’ assessment is based on the vary independently in Bauer and colleagues’ Cassen, C. Glob. Environ. Change 65, 102191 (2020).
nature
communications
Publishing high-quality Submit your research and benefit from a fast

decision process, full open access, CC BY
open access research from licensing as standard, the #3 most-cited*
multidisciplinary open access journal in the
across all areas of the world and 3 million monthly page views on
journal site.
natural sciences. *Clarivate Analytics, 2019
nature.com/ncomms @NatureComms @NatureCommunications A81661

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Article
Detection of large-scale X-ray bubbles in the

Milky Way halo
https://doi.org/10.1038/s41586-020-2979-0 P. Predehl1 ✉, R. A. Sunyaev2,3 ✉, W. Becker1,4, H. Brunner1, R. Burenin2, A. Bykov5,

A. Cherepashchuk6, N. Chugai7, E. Churazov2,3 ✉, V. Doroshenko8, N. Eismont2, M. Freyberg1,
Received: 10 July 2020
M. Gilfanov2,3 ✉, F. Haberl1, I. Khabibullin2,3, R. Krivonos2, C. Maitra1, P. Medvedev2, A. Merloni1 ✉,
Accepted: 25 September 2020 K. Nandra1 ✉, V. Nazarov2, M. Pavlinsky2,11, G. Ponti1,9, J. S. Sanders1, M. Sasaki10, S. Sazonov2,
A. W. Strong1 & J. Wilms10
Published online: 9 December 2020
Check for updates

The halo of the Milky Way provides a laboratory to study the properties of the shocked
hot gas that is predicted by models of galaxy formation. There is observational
evidence of energy injection into the halo from past activity in the nucleus of the Milky
Way1–4; however, the origin of this energy (star formation or supermassive-black-hole
activity) is uncertain, and the causal connection between nuclear structures and
large-scale features has not been established unequivocally. Here we report
soft-X-ray-emitting bubbles that extend approximately 14 kiloparsecs above and
below the Galactic centre and include a structure in the southern sky analogous to the
North Polar Spur. The sharp boundaries of these bubbles trace collisionless and
non-radiative shocks, and corroborate the idea that the bubbles are not a remnant of a
local supernova5 but part of a vast Galaxy-scale structure closely related to features
seen in γ-rays6. Large energy injections from the Galactic centre7 are the most likely
cause of both the γ-ray and X-ray bubbles. The latter have an estimated energy of
around 1056 erg, which is sufficient to perturb the structure, energy content and
chemical enrichment of the circumgalactic medium of the Milky Way.
eROSITA8 is a large-collecting-area and wide-field-of-view X-ray tel- Although less evident at first glance, close inspection of the
escope, launched into space onboard the Spektr-RG mission on 13 medium-energy-band (0.6–1.0 keV) image in the hemisphere below the
July 2019. Over the course of six months (December 2019–June 2020), plane of the Milky Way (‘south’) reveals an astonishing new feature—a
Spektr-RG and eROSITA have completed a survey of the whole sky at huge circular annulus of similar shape and scale to the structure seen
energies of 0.2–8 keV—much deeper than the only other all-sky survey in the north (Fig. 2). Together, they seem to form a pair of ‘bubbles’ that
with an X-ray-imaging telescope, which was performed by ROSAT in emerge from the Galactic centre. They are traceable at various levels
1990 at energies of 0.1–2.4 keV. of intensity throughout most of the sky, and should represent a very
The sky map from the first eROSITA all-sky survey is shown in Fig. 1. large object (several kiloparsecs), akin to the Fermi bubbles, because
This image has been created from calibrated events in the energy range local features are unlikely to exhibit the fourfold symmetry around the
0.3–2.3 keV (Methods). A preliminary analysis indicates that more than direction towards the centre of the Galaxy.
one million X-ray point sources and about 20,000 extended ones are The Fermi bubbles were discovered in 20101 with the Fermi-LAT
detected in the survey. This is comparable to, and may exceed, the total (Fermi large-area telescope) γ-ray instrument. They have a hard,
number of X-ray sources known before eROSITA launched. Multiwave- non-thermal spectrum, which shows up clearly in maps at energies of
length identifications using the WISE and Gaia catalogues9,10 suggest more than 1 GeV. Their emission is probably due to inverse Compton
that about 80% of the point sources are distant active galactic nuclei scattering of cosmic-ray electrons on the cosmic microwave back-
(AGN; comprising about 80% of all known blazars) and that around ground and other radiation fields. This kiloparsec-scale structure was
20% are coronally active stars in the Milky Way, including about 150 quickly interpreted as a possible manifestation of past activity of the
planet-hosting stars (roughly 10% of all known outside of the Kepler now dormant supermassive black hole in the centre of the Milky Way,
field). thus linking it with AGN observed outside the Galaxy12–15. Alternatively,
Various very large and diffuse extended structures are visible in a burst of star formation could power the bubbles16–18. In either case,
the all-sky survey map. The most obvious is a quasi-circular feature, the energy needed to power their formation must have been very large,
which is part of the North Polar Spur and Loop I (northwest quadrant) at roughly 1055 erg1,19.
discovered in the early days of X-ray and radio astronomy, respec- X-ray emission from the North Polar Spur had already been found
tively5,11. by ROSAT5. Although considered in most early models to be a nearby
1
Max-Planck-Institut für Extraterrestrische Physik, Garching, Germany. 2Space Research Institute of the Russian Academy of Sciences, Moscow, Russia. 3Max-Planck-Institut für Astrophysik,
Garching, Germany. 4Max-Planck-Institut für Radioastronomie, Bonn, Germany. 5Ioffe Institute, St Petersburg, Russia. 6M. V. Lomonosov Moscow State University, P. K. Sternberg Astronomical
Institute, Moscow, Russia. 7Institute of Astronomy, Russian Academy of Sciences, Moscow, Russia. 8Institut für Astronomie und Astrophysik, Tübingen, Germany. 9INAF-Osservatorio
Astronomico di Brera, Merate, Italy. 10Dr. Karl-Remeis-Sternwarte Bamberg and Erlangen Centre for Astroparticle Physics, Universität Erlangen-Nürnberg, Bamberg, Germany.
11
Deceased: M. Pavlinsky. ✉e-mail: predehl@mpe.mpg.de; sunyaev@iki.rssi.ru; churazov@iki.rssi.ru; gilfanov@iki.rssi.ru; am@mpe.mpg.de; knandra@mpe.mpg.de

Article
Fig. 1 | The Spektr-RG–eROSITA all-sky map. An RGB map of the first (with a Gaussian with a full-width at half-maximum (FWHM) of 10′) to generate
Spektr-RG–eROSITA all-sky survey (red for 0.3–0.6 keV, green for 0.6–1.0 keV, this one. Image adapted from ref. 34. Credit: Jeremy Sanders, Hermann Brunner,
blue for 1.0–2.3 keV) is shown in Galactic coordinates, using a Hammer–Aitoff Andrea Merloni and the eSASS team (MPE); Eugene Churazov, Marat Gilfanov
projection. The original image, with a resolution of about 12″, was smoothed (on behalf of IKI).
supernova remnant nearly surrounding us, the possibility that the they would extend roughly 14 kpc above and below the Galactic plane
North Polar Spur is of Galactic scale has been proposed6,20, and is sup- (Extended Data Fig. 1).
ported by several observational arguments7,21. In particular, study of Second, from a preliminary spectral analysis of eROSITA data, the
absorption in X-ray and radio bands places a lower limit of 300 pc on absorbing column density of the diffuse emission in the southwestern
the distance to the structure21, which rules out a nearby supernova bright rim of the eROSITA bubbles (white rectangle in Fig. 2) can be
remnant. In addition, evidence for a large-scale bipolar wind has been constrained to NH = (1.0–3.5) × 1021 cm−2, consistent with what has been
presented, based purely on X-ray and mid-infrared data, even before measured previously21 for the northern structure. One-dimensional
the discovery of the Fermi bubbles22. cross-sections of the observed surface brightness at various latitudes
With the eROSITA data, the full scope and morphology of these gigan- (Fig. 2) are qualitatively consistent with the projection of (quasi-)spher-
tic X-ray structures has become evident. ROSAT, owing to a combination ical thick shells with an outer diameter of 14 kpc. Regardless of the
of its lower sensitivity and softer energy response, could reveal only uncertainties on these numbers, it is clear that the eROSITA bubbles
the brightest part of the southern loop closest to the Galactic plane1,22, are comparable in size to the Galactic disk24.
not the whole structure. More recently, the 0.7–1-keV all-sky map from We note that the extended X-ray emission revealed by eROSITA coin-
the solid-state slit camera (SSC) of MAXI also provided evidence of a cides spatially with the soft component of the GeV emission reported
southern enhancement on these large scales, and a close north–south to surround the Fermi bubbles2,7,25. A possible connection with polar-
symmetry23. ized radio-continuum emission at 2.3 GHz and 23 GHz26 has yet to be
The Fermi bubbles and large-scale X-ray emission revealed by eROS- explored.
ITA show remarkable morphological similarity. We therefore suggest An episodic or continuous energy release in the region of the Galactic
that the Fermi bubbles and the eROSITA structure are physically related, centre is expected to generate a series of distinct structures: shocks and
and refer to the latter as ‘eROSITA bubbles’. Our discovery confirms the contact discontinuities. We see two prominent structures in our maps:
previously suggested common origin of the two objects6,7. The motiva- one is the outer boundary of the eROSITA bubbles; the other separates
tion for a separate name is that, despite the probably common origin, the eROSITA bubbles and the Fermi bubbles. The sharp boundary of
the two structures differ in some important respects. the eROSITA bubbles—which appears bright in X-rays, indicative of
First, we compare their morphologies on the sky (Fig. 3). The Fermi hotter gas at the boundary than outside it—clearly traces the presence
bubbles are roughly elliptical, about 55° × 45° (north–south, east–west) of a non-radiative (or adiabatic) shock (see Methods for an estimate of
in diameter, symmetric about the Galactic centre, with vertical axis per- the gas cooling time). We associate the boundary with a forward shock
pendicular to the Galactic plane, and roughly uniform in γ-ray intensity. linked to the onset of large energy release at the Galactic centre. The
The eROSITA bubbles appear as extended as 80° in longitude, roughly nature of the boundary between the eROSITA and Fermi bubbles is less
80°–85° in latitude and concentrated in annuli or shells. This suggests clear. It could be another forward shock (in the case of a sequence of
that they are, to first-order, close to spherical, with a radius of about energy releases), a reverse shock, a wind-termination shock or a contact
6–7 kpc along the plane, extending radially on the Milky Way close to discontinuity. The reverse or termination shock models for the Fermi
the Sun, so that their northern and southern edges are imprinted by bubbles would imply an additional contact discontinuity somewhere
the closer rim of the bubble. The full vertical extent of the eROSITA between the Fermi and eROSITA bubbles, which is not apparent in the
bubbles is more difficult to determine; assuming a spherical geometry, data. Instead, we consider the simplest scenario in which the eROSITA

85° 85
a 80° 80°
70° 70°
60° 60°
50° 50°
40° 40°
30° 30°
20° 20°
10° 10°
23 h
22 h
21 h
19 h
17 h
20 h
18 h
16 h
15 h
0h
3h
2h
1h
14 h
5h
10 h
7h
6h
4h
9h
8h
13 h
11 h
12 h
12 h
–20° –20°
–30° –30°
–40° –40°
–50° –50°
–60° –60°
–70° –70°
–80° –80°
–90°
b
Surface brightness (counts s–1 deg–2)
20
+60° +50° +40°
10
20
Surface brightness (counts s–1 deg–2)
–60° –50° –40°
10
100 50 0 −50 −100 100 50 0 −50 −100 100 50 0 −50 −100

Galactic longitude (°) Galactic longitude (°) Galactic longitude (°)
Fig. 2 | The soft-X-ray eROSITA bubbles. a, False-colour map of extended (not normalized to the data): a full sphere (yellow), a very thick shell (thickness,
emission detected by eROSITA in the 0.6–1.0-keV range. The contribution of 4 kpc; brown), a thick shell (thickness, 2 kpc; cyan) and a thin shell (thickness,
the point sources has been removed and the scaling adjusted to enhance 0.2 kpc; green). The thick shell (cyan) is the most consistent with the data (see
large-scale structures in the Galaxy. b, One-dimensional surface-brightness Extended Data Fig. 2 for a two-dimensional projection of this model). The
profiles in the same energy band (red lines with pink shading representing region indicated by the white rectangle is where a preliminary spectral analysis
statistical uncertainties), cut at various galactic latitudes (as labelled). For was performed to constrain the line-of-sight absorption column density
comparison, we also show the predictions of four possible geometric models towards the southern eROSITA bubbles.
and Fermi bubbles are causally connected, with the Fermi bubbles driv- outflow, and the boundary of the eROSITA bubbles is the shock that
ing the expansion of the eROSITA bubbles and both structures being propagates through the halo gas. The pressure is thus continuous
associated with the same (gradual or instantaneous) energy release in across the interface between the eROSITA and Fermi bubbles and the
the nuclear region of the Milky Way. In this scenario, the outer bound- total thermal energies of the two features simply reflect their volumes
ary of the Fermi bubbles plausibly represents a contact discontinuity (ignoring the effects of stratification, which may be non-negligible).
that separates the shock-heated interstellar medium from the shocked Given that their characteristic sizes differ by a factor of about 2, the

Article
Fig. 3 | Comparison of the morphology of the γ-ray and X-ray bubbles. the extended gigaelectronvolt emission traditionally referred to as Fermi
A composite Fermi–eROSITA image is shown. The X-ray extended emission bubbles (red; Fermi map adapted from ref. 35), unequivocally establishing their
revealed by eROSITA (0.6–1-keV band; cyan) encloses the hard component of close relation.
total thermal energy of the eROSITA bubbles is almost 10 times larger Way, have hot plasma in their haloes that is highly perturbed by
than that of the Fermi bubbles. activity in their disks, demonstrating the presence of a feedback
The obser ved average X-ray surface brightness of mechanism in apparently quiescent galaxies. Galaxies are thought
(2–4) × 10−15 erg cm−2 s−1 arcmin−2 in the eROSITA bubbles (Methods), to grow via the slow recondensation of the hot halo plasma, which
which decreases with Galactic latitude, is in broad agreement with the was shock-heated during the collapse of the dark-matter halo33.
above scenario. The observed surface brightness, integrated over the The cooling time of the hot plasma in the halo is comparable to
full extent of the eROSITA bubbles, implies a total luminosity of hot the Hubble time, so the process of growing a galaxy is assumed
X-ray-emitting plasma of L ≈ 1 × 1039 erg s−1. to be steady (apart from mergers) and slow. Here we have direct
To inflate the eROSITA bubbles, an average luminosity of the order of evidence of the re-heating of such plasma, to considerable heights
1041 erg s−1 during the past tens of millions of years would be required, above the Galactic disk.
and could arise from either star-forming or AGN activity in the Galactic The detection of these X-ray bubbles was enabled by the combined
centre. As discussed above, the arguments in favour of each interpreta- capabilities of the eROSITA instrument and the Spektr-RG mission
tion in the context of the Fermi bubbles have been debated extensively. profile. More detailed analysis following accurate calibration of the
In the case of the eROSITA bubbles, the energetics are such that they are instrument, substantial increases in data quality from the ongoing sky
at the limit of what the past starburst activity at the centre of the Milky survey and follow-up observations in other parts of the electromagnetic
Way could provide. Alternatively, the eROSITA bubbles could be inflated spectrum will reveal further details of the properties of the eROSITA
by a period (about 1–2 Myr) of Seyfert-like activity (L ≈ 1043 erg s−1) of bubbles and the implications for the structure and evolution of galax-
the central supermassive black hole (Sgr A*). The long cooling time of ies, including the Milky Way.
the hot plasma is consistent with such a hypothesis.
The structures seen here are reminiscent of similar effects seen in
AGN that host rapidly accreting supermassive black holes1. These can Online content
inject a vast amount of mechanical energy into the ambient gas, as Any methods, additional references, Nature Research reporting sum-
revealed by radio-bright bubbles embedded in the X-ray cocoons27. This maries, source data, extended data, supplementary information,
process, known as AGN feedback, is seen in objects ranging from indi- acknowledgements, peer review information; details of author con-
vidual early-type galaxies, such as Centaurus A28, to massive clusters, tributions and competing interests; and statements of data and code
such as A426 (Perseus)29,30, and is thought to have potentially marked availability are available at https://doi.org/10.1038/s41586-020-2979-0.
effects on the evolution of galaxies. On the other hand, explosions of
supernova associated with star formation yield kinetic energy of the 1. Su, M., Slatyer, T. R. & Finkbeiner, D. P. Giant gamma-ray bubbles from Fermi-LAT: active
order of 1051 erg per supernova in the ejecta (also known as stellar feed- galactic nucleus activity or bipolar Galactic wind? Astrophys. J. 724, 1044–1082 (2010).
back), which may drive an outflow from the central region of a galaxy31. 2. Ackermann, M. et al. The spectrum and morphology of the Fermi bubbles. Astrophys. J.
793, 64 (2014).
M82 provides a good example of the latter mechanism32. The energet- 3. Heywood, I. et al. Inflation of 430-parsec bipolar radio bubbles in the Galactic centre by
ics and the most salient features of the observed eROSITA bubbles are an energetic event. Nature 573, 235–237 (2019).
such that neither of the two mechanisms could be excluded a priori. 4. Ponti, G. et al. An X-ray chimney extending hundreds of parsecs above and below the
Galactic centre. Nature 567, 347–350 (2019).
Irrespective of the specific source of energy, our results cor- 5. Egger, R. & Aschenbach, B. Interaction of the Loop I supershell with the local hot bubble.
roborate the notion that inactive disk galaxies, such as the Milky Astron. Astrophys. 294, L25–L28 (1995).

6. Sofue, Y. Bipolar hypershell Galactic center starburst model: further evidence from 22. Bland-Hawthorn, J. & Cohen, M. The large-scale bipolar wind in the Galactic center.
ROSAT data and new radio and X-ray simulations. Astrophys. J. 540, 224–235 (2000). Astrophys. J. 582, 246–256 (2003).
7. Kataoka, J. et al. X-ray and gamma-ray observations of the Fermi bubbles and NPS/Loop I 23. Nakahira, S. et al. MAXI/SSC all-sky maps from 0.7 keV to 4 keV. Publ. Astron. Soc. Japan
structures. Galaxies 6, 27 (2018). 72, 17 (2020).
8. Merloni, A. et al. eROSITA science book: mapping the structure of the energetic Universe. 24. Bland-Hawthorn, J. & Gerhard, O. The galaxy in context: structural, kinematic, and
Preprint at https://arxiv.org/abs/1209.3114 (2012). integrated properties. Annu. Rev. Astron. Astrophys. 54, 529–596 (2016).
9. Gaia Collaboration. Gaia data release 2. Summary of the contents and survey properties. 25. Casandjian, J.-M. The Fermi-LAT model of interstellar emission for standard point source
Astron. Astrophys. 616, A1 (2018). analysis. Preprint at https://arxiv.org/abs/1502.07210 (2015).
10. Eisenhardt, P. R. M. et al. The CatWISE preliminary catalog: motions from WISE and 26. Carretti, E. et al. Giant magnetized outflows from the centre of the Milky Way. Nature 493,
NEOWISE data. Astrophys. J. Suppl. Ser. 247, 69 (2020). 66–69 (2013).
11. Berkhuijsen, E. M. A survey of the continuum radiation at 820 MHz between declinations 27. Böhringer, H. et al. A ROSAT HRI study of the interaction of the X-ray emitting gas and
-7° and +85°. A study of the Galactic radiation and the degree of polarization with special radio lobes of NGC 1275. Mon. Not. R. Astron. Soc. 264, L25–L28 (1993).
reference to the loops and spurs. Astron. Astrophys. 14, 359–386 (1971). 28. Kraft, R. et al. X-ray emission from the hot interstellar medium and southwest radio lobe
12. Zubovas, K., King, A. R. & Nayakshin, S. The Milky Way’s Fermi bubbles: echoes of the last of the nearby radio galaxy Centaurus A. Astrophys. J. 592, 129–146 (2003).
quasar outburst? Mon. Not. R. Astron. Soc. 415, L21–L25 (2011). 29. Churazov, E. et al. Asymmetric, arc minute scale structures around NGC 1275. Astron.
13. Guo, F. & Mathews, W. G. The Fermi bubbles. I. Possible evidence for recent AGN jet Astrophys. 356, 788–794 (2000).
activity in the galaxy. Astrophys. J. 756, 181 (2012). 30. Fabian, A. C. et al. Chandra imaging of the complex X-ray core of the Perseus cluster.
14. Mou, G. et al. Fermi bubbles inflated by winds launched from the hot accretion flow in Sgr Mon. Not. R. Astron. Soc. 318, L65–L68 (2000).
A*. Astrophys. J. 790, 109 (2014). 31. Strickland, D. K. & Stevens, I. R. Starburst-driven galactic winds – I. Energetics and
15. Zhang, R. & Guo, F. Simulating the Fermi bubbles as forward shocks driven by AGN jets. intrinsic X-ray emission. Mon. Not. R. Astron. Soc. 314, 511–545 (2000).
Astrophys. J. 894, 117 (2020). 32. Rieke, G. H. et al. The nature of the nuclear sources in M82 and NGC 253. Astrophys. J.
16. Crocker, R. M. & Aharonian, F. Fermi bubbles: giant, multibillion-year-old reservoirs of 238, 24–40 (1980).
Galactic center cosmic rays. Phys. Rev. Lett. 106, 101102 (2011). 33. Tumlinson, J., Peeples, M. S. & Werk, J. K. The circumgalactic medium. Annu. Rev. Astron.
17. Lacki, B. C. The Fermi bubbles as starburst wind termination shocks. Mon. Not. R. Astron. Astrophys. 55, 389–432 (2017).
Soc. 444, L39–L43 (2014). 34. Sanders, J. et al. Annotated version of the eROSITA first all-sky image. http://www.mpe.
18. Crocker, R. M., Bicknell, G. V., Taylor, A. M. & Carretti, E. A unified model of the Fermi mpg.de/7461950/erass1-presskit (2020).
bubbles, microwave haze, and polarized radio lobes: reverse shocks in the Galactic 35. Selig, M., Vacca, V., Oppermann, N. & Enßlin, T. A. The denoised, deconvolved, and
center’s giant outflows. Astrophys. J. 808, 107 (2015). decomposed Fermi γ-ray sky. An application of the D3PO algorithm. Astron. Astrophys.
19. Miller, M. J. & Bregman, J. N. The Interaction of the Fermi Bubbles with the Milky Way’s Hot 581, 126 (2015).
Gas Halo. Astrophys. J. 829, 9 (2016).
20. Sofue, Y. Propagation of magnetohydrodynamic waves from the Galactic center. Origin Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
of the 3-kpc arm and the North Polar Spur. Astron. Astrophys. 60, 327–336 (1977). published maps and institutional affiliations.
21. Lallement, R. et al. On the distance to the North Polar Spur and the local CO-H2 factor.
Astron. Astrophys. 595, A131 (2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods rotation axis is oriented to the neighbourhood of the Sun, with an aver-
age progression in ecliptic longitude of about 1° day−1; it thus completes
eROSITA aboard Spektr-RG one all-sky survey in half a year. The scan speed guarantees that the
On 13 July 2019, the X-ray observatory Spektrum-Roentgen-Gamma angular resolution is not degraded by smearing of the photons during
(Spektr-RG or SRG; a joint Russian–German mission; R. Sunyaev et al., the 50-ms CCD read-out cycle, and provides sufficient overlap between
manuscript in preparation) was launched into space from the Rus- individual scans to enable source variability analysis and homogeneous
sian cosmodrome Baikonur in Kazakhstan. The Navigator spacecraft survey exposure. Since the start of the survey, the eROSITA cameras
platform for SRG (NPO Lavochkin) and launcher (Proton M + Block have been operating with high efficiency, with more than 96% of the
DM03 upper stage) were both provided by the Russian space agency observing time resulting in scientifically useful data. The only sub-
Roscosmos. The observatory carries two X-ray telescopes: ART-XC stantial loss of efficiency is due to single event upsets in parts of the
(astronomical roentgen telescope X-ray concentrator; M. Pavlinsky firmware that control the seven cameras, most probably caused by
et al., manuscript in preparation), a Russian hard-X-ray instrument, heavy particles (cosmic rays). Mitigation strategies and procedures
and the German-led eROSITA (extended roentgen survey imaging tel- have since been put in place by the operations teams. This, combined
escope array)36. eROSITA was designed to address fundamental ques- with the very stable particle background along the large halo L2 orbit
tions of astrophysical cosmology such as the nature of dark energy of SRG and the flexible mission planning, has resulted in a gap-free
and dark matter, the origin and growth of supermassive black holes, coverage for the first all-sky survey.
and the expansion of discovery space for rare objects including those As SRG scans the sky, for each X-ray photon detected the eROSITA
of unknown nature8. instrument records its position on the detector, energy and time tag.
eROSITA consists of seven identical and coaligned X-ray telescopes These data, along with detector housekeeping and star-tracker infor-
housed in a common optical bench that connects seven mirror mod- mation, are telemetered to ground stations in Russia daily. They are
ules with their associated cameras. The dimensions of the telescope immediately sent from these ground stations, via NPOL Lavochkin and
structure are approximately 1.9 m diameter and 3.2 m height; the total IKI (the Space Research Institute of the Russian Academy of Science) to
mass is 810 kg. the Max Planck Institute for Extraterrestrial Physics (MPE), where they
The mirror modules comprise 54 paraboloid or hyperboloid mirror are checked and converted to FITS (flexible image transport system)
shells in a Wolter-I geometry, with an outer diameter of 360 mm and format. These files are then cut into 4-h intervals, the duration of a
a common focal length of 1,600 mm. The on-axis spatial resolution of great circle scan. A pipeline based on the eROSITA standard analysis
all mirror modules is around 16 arcsec (half energy width) at 1.5 keV. software system (eSASS) determines good time intervals, dead times
X-ray baffles in front of the mirrors effectively suppress stray X-ray and corrupted events and frames, masks bad pixels, and applies pattern
light from sources outside the field of view, while magnetic electron recognition and energy calibration. Finally, star-tracker and gyro data
deflectors behind the mirrors help to reduce background that arises are used to assign celestial coordinates to each reconstructed X-ray
from low-energy cosmic-ray electrons. photon. These (time-ordered) photons are then projected into sky
Each mirror module has a p–n charge-coupled device (pnCCD) cam- tiles of size 3.6° × 3.6°, where images and exposure maps are created
era at its focus, with 384 × 384 pixels on a single-chip image area of for downstream analysis.
28.8 mm × 28.8 mm, which provides a circular field of view of 1.03° Individual sources are detected in the sky tiles using a three-step
diameter. The nominal integration time for all eROSITA cameras is procedure. First, a local sliding-window detection algorithm identifies
50 ms. For calibration purposes, each camera has its own filter wheel enhancements, which are excised from the images. Second, adaptively
with a radioactive 55Fe source and an aluminium or titanium target that filtered background maps of the source-free regions are created. Third,
provides three spectral lines, at 5.9 keV (Mn Kα), 4.5 keV (Ti Kα) and the sliding-window detection is repeated, and the identified excesses
1.5 keV (Al Kα). During science operations, the CCDs are passively cooled with respect to the background maps provide an input list of source
down to −85 °C via heat pipes and radiators. Noteworthy features of candidates from which a maximum-likelihood point-spread-function
the detectors include the low and very stable in-orbit background and fitting algorithm determines the best-fitting source parameters and
excellent spectral response at low energies. detection likelihoods. Data from all seven telescope modules are
Two star-trackers are mounted on eROSITA to ensure accurate bore- merged in this approach.
sight and attitude reconstruction. Thanks to these and the excellent
stability of the spacecraft, point-source positional accuracy of about Generation of all-sky maps
4″ (1σ) is achieved. The effective area at 1 keV of all seven telescopes To produce the sky map shown in Fig. 1, we projected the detected X-ray
together is similar to that of all three XMM-Newton EPIC cameras; the photons onto the sky, applying smoothing using a two-dimensional
grasp, defined as the product of field of view and average effective area, Gaussian with FWHM = 10 arcmin, significantly larger than either the
is about four times larger. Together with the ability of the spacecraft to native resolution of the map (pixel size of 4″) or the point-spread func-
continuously scan large areas, this makes eROSITA an extremely fast tion of the instrument (FWHM ≈ 12″).
survey machine, capable of imaging uniformly large swaths of sky. Three sky maps were made in three energy bands (0.3–0.6 keV,
On its three-month cruise to its halo orbit around Lagrangian point 0.6–1.0 keV and 1.0–2.3 keV), which split the X-ray events according
L2 of the Sun–Earth system, all SRG systems were put into operation to the calibrated and recombined amplitude values from the CCDs.
and checked. The instruments were calibrated and their performance Lower-energy events (recorded down to 0.12 keV) were discarded,
verified with a series of scientific observations. owing to the presence of artefacts generated by yet-uncalibrated detec-
tor noise; higher energy events (2.3–8 keV) are not shown, owing to the
SRG–eROSITA observations lower sensitivity of eROSITA and the effects of the particle background,
The first SRG all-sky survey was conducted between 13 December 2019 which greatly reduce the signal-to-noise ratio in this harder band.
and 11 June 2020. Eight all-sky surveys are planned in total, each deliver- The three images were reprojected into Galactic coordinates using a
ing an average exposure with eROSITA of about 200 s/cos(lat), where Hammer–Aitoff projection. This projection maintains equal area on the
lat is the ecliptic latitude; the 1-square-degree area around the ecliptic sky. We similarly computed the total exposure time that each pixel on
poles is revisited every four hours, accumulating an exposure of about the sky was observed for during the survey, applying the same Gauss-
30 ks per survey. ian smoothing. This exposure time takes into account various factors
During the all-sky survey, the spacecraft rotated continuously with that reduce observed photon rates, such as the number of cameras
a scan rate of 90° h−1, giving a four-hour period per revolution. The active at any one time, periods removed by screening software, the
effect of vignetting as a source moves away from the optical axis of respectively. Assuming a distance of 10.6 kpc, we derive luminosities
the telescopes, and bad pixels in the detectors. of 6.0 × 1038 erg s−1 and 4.3 × 1038 erg s−1.
Images of the three broadband maps were made using a logarith- The total energy content of the eROSITA bubbles may be estimated
mic scaling between the upper and lower bounds, chosen to show the from their sizes and estimates of the density and temperature of the
majority of the large-scale structure. These bounds give dynamical X-ray-emitting gas. We considered two approaches. First, we assumed
ranges of 13, 35 and 25. The three maps were then combined as red (0.3– a Mach number of the shock of M ≈ 1.5, as follows from the Rankine–
0.6 keV), green (0.6–1.0 keV) and blue (1.0–2.3 keV) channels to make an Hugoniot condition for the temperature increase from about 0.2 keV
RGB image. Brightness and contrast in each channel was adjusted outside the bubbles to around 0.3 keV inside7,19, and adopted an electron
manually. density of the upstream halo gas at a distance of 10 kpc from the centre
The 0.3–0.6-keV band (red), reveals large, soft structures such as the of roughly 4 × 10−4 cm−3 (ref. 19), on the basis of the spectral analysis of
20°-wide Monogem supernova remnant and the Eridanus superbubble numerous XMM-Newton and Suzaku lines of sight. For such a Mach
located southwest of the Orion nebula, which cover a large fraction number, the downstream pressure is a factor of about 2.6 larger than the
of the X-ray sky. Further towards the direction of the Galactic centre, upstream pressure, yielding a total energy of two 7-kpc-radius spheres
the Vela supernova remnant, together with the overlapping Puppis A of roughly 8 × 1055 erg. The contribution of the kinetic energy of the
and Vela–Junior remnants, appear as prominent bright and extended expanding gas to the total energy is modest. Second, the density of
X-ray sources. Almost symmetric to Vela with respect to the Galactic the gas in the eROSITA bubbles may be estimated from the observed
centre, the Cygnus superbubble and Cygnus loop are most prominent, X-ray surface brightness (or X-ray luminosity), assuming a plasma with
with emission connecting Cygnus and distant Draco regions. Between kT ≈ 0.3 keV, a metallicity roughly 0.2 times solar37,38 and size along the
these, just north of the Galactic centre, the brightest persistent X-ray line of sight of about 5 kpc (or assuming a shell-like geometry of the
source in the sky (Sco X-1) can be seen; owing to its exceptional bright- bubbles; Fig. 2, Extended Data Fig. 2). Assuming that the X-ray emission
ness, a stray-light halo around the source gives it the appearance of an from the eROSITA bubbles comes from a shell with inner radius of 3 kpc
extended object34. and outer radius of 7 kpc, we estimate the electron density of the hot
Although a hint of extended emission associated with the eROSITA emitting plasma to be 0.002 cm−3 and the thermal energy within each
bubbles is directly evident in the all-sky map (Fig. 1), its appearance bubble to be Eth ≈ 1.3 × 1056 erg.
can be enhanced (Fig. 2). First, it appears most clearly in the 0.6– The velocity of the M ≈ 1.5 shock in a 0.2-keV gas is approximately
1.0 keV energy band, where both the effective area of eROSITA and 340 km s−1. This velocity implies a characteristic expansion time to the
the intensity of the roughly 0.3-keV hot bubbles peak, so we use only present size of around 20 Myr, which translates to an energy-release
this band in the analysis. Further improvement may be achieved by rate of roughly (1–3) × 1041 erg s−1. The cooling time39 of such tenuous
removing the contribution of the point sources from the map. Because hot gas (density n = 0.002 cm−3, logT = 6.5) is approximately 1.9 × 108 yr,
point-source analysis in the survey is still ongoing, whereas the eROSITA much longer than the estimated age of the bubbles. Therefore, once
point-spread-function ground calibration is verified against in-flight heated, the interior of the bubbles will be visible in X-rays for a long
data, this task was accomplished using the full sky map obtained as time, even after the energy release has ceased.
described above, using the Montage, scikit-image and astropy pack-
ages. In particular, point sources were identified and masked on
small areas of the map reprojected to a set of non-overlapping tiles Data availability
(to avoid elongation of point sources). This was accomplished using The datasets analysed during this study are not yet publicly available.
a difference-of-Gaussians filter with a radius corresponding to the Their proprietary scientific exploitation rights were granted by the
mean size of a point source in the image. The filtered image was then project funding agencies (Roscosmos and DLR) to two consortia led by
thresholded using the levels calculated with the threshold_local func- MPE (Germany) and IKI (Russia), respectively. The SRG–eROSITA all-sky
tion in scikit-image (which allows us to leave all large-scale structures survey data will be released publicly after a minimum period of 2 years.
unmasked) and the resulting mask was applied to the observed image.
The masked image was convolved using a Gaussian kernel with size 36. Predehl, M. et al. The eROSITA X-ray telescope on SRG. Astron. Astrophys. https://doi.
of five pixels (astropy.convolve) and reprojected to the initial Ham- org/10.1051/0004-6361/202039313 (2020).
37. Kataoka, J. et al. Suzaku observations of the diffuse X-ray emission across the Fermi
mer–Aitoff projection. Finally, the logarithmic intensity scale and bubbles’ edges. Astrophys. J. 779, 57 (2013).
cubehelix colour scale were then applied to the image to allow a large 38. Ursino, E., Galeazzi, M. & Liu, W. Studying the Interstellar medium and the inner region of
dynamic range of observed intensities to be displayed in print. All local NPS/LOOP 1 with shadow observations toward MBM36. Astrophys. J. 816, 33 (2016).
39. Sutherland, M. S. & Dopita, M. A. Cooling functions for low-density astrophysical plasmas.
manipulations described above affect only the scales comparable with Astrophys. J. Suppl. Ser. 88, 253–327 (1993).
the point-source size and do not affect the large-scale structures dis-
cussed in the paper. Acknowledgements This work is based on data from eROSITA, the primary instrument aboard
To compare the morphology of the eROSITA and Fermi bubbles SRG, a joint Russian–German science mission supported by the Russian Space Agency
(Fig. 3), we used the image of a bubble-like component35 reprojected (Roskosmos), in the interests of the Russian Academy of Sciences, represented by its Space
Research Institute (IKI), and the Deutsches Zentrum für Luft- und Raumfahrt (DLR). The SRG
into a Hammer–Aitoff projection. The final composite was obtained spacecraft was built by Lavochkin Association (NPOL) and its subcontractors, and is operated
by applying sequential cyan and red colour scales for the eROSITA and by NPOL with support from IKI and the Max Planck Institute for Extraterrestrial Physics (MPE).
Fermi images, respectively. The development and construction of the eROSITA X-ray instrument was led by MPE, with
contributions from the Dr. Karl Remeis Observatory Bamberg and ECAP (FAU
Erlangen-Nuernberg), the University of Hamburg Observatory, the Leibniz Institute for
Estimating the energetics of the eROSITA bubbles Astrophysics Potsdam (AIP), and the Institute for Astronomy and Astrophysics of the University
In the 0.6–1.0-keV band, average count rates within the northern of Tübingen, with the support of DLR and the Max Planck Society. The Argelander Institute for
Astronomy of the University of Bonn and the Ludwig Maximilians Universität Munich also
and southern eROSITA bubbles of 0.0038 photons s−1 arcmin−2 and participated in the science preparation for eROSITA. The eROSITA data shown here were
0.0026 photons s−1 arcmin−2, respectively, were observed. These processed using the eSASS/NRTA software system developed by the German eROSITA
count rates translate to fluxes of 3.6 × 10−15 erg cm−2 s−1 arcmin−2 and consortium. We thank the entire eROSITA collaboration team, in Germany and Russia, who,
over many years, have given fundamental contributions to the development of the mission, the
2.5 × 10−15 erg cm−2 s−1 arcmin−2, assuming a plasma with temperature instrument and the science exploitation of the eROSITA data. SRG–eROSITA data processing
kT = 0.3 keV (here k is the Boltzmann constant) and abundances 0.2 and calibration and data analyses were performed by a large number of collaboration
times solar19,21,37,38. Assuming a projected area of the eROSITA bubbles members in both the German and Russian teams, who also discussed and approved the
scientific results presented here. This research made use of Montage. It is funded by the
of 35° × 35° × π for each bubble, the total flux is 5 × 10−8 erg cm−2 s−1 National Science Foundation under grant number ACI-1440620, and was previously funded by
and 3.4 × 10−8 erg cm−2 s−1 for the northern and southern bubbles, NASA’s Earth Science Technology Office, Computation Technologies Project, under
Article
cooperative agreement number NCC5-626 between NASA and the California Institute of and their implications. The remaining co-authors made important contributions to SRG
Technology. G.P. acknowledges funding from the European Research Council (ERC) under the mission planning and operations, eROSITA data acquisition and analysis, and software
European Union’s Horizon 2020 research and innovation programme (grant agreement development for SRG–eROSITA.
number 865637).
Competing interests The authors declare no competing interests.
Author contributions H.B., M.F., C.M. and J.S.S. developed software to process the eROSITA
data and processed the German proprietary data that resulted in the all-sky maps. E.C., M.G., Additional information
I.K., and P.M. processed the Russian proprietary data. H.B., E.C., M.G., C.M. and J.S.S. performed Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
the analysis that resulted in Fig. 1. V.D., I.K. and J.S.S. performed the image processing that 2979-0.
resulted in Figs. 2, 3 and Extended Data Figs. 1, 2. The majority of the text was written by P.P., Correspondence and requests for materials should be addressed to P.P., R.A.S., E.C., M.G.,
W.B., M.F., M.G., E.C., G.P., A.W.S., M.S., H.B. and V.D. V.D. and E.C. worked on Fig. 2; Fig. 3 was A.M. or K.N.
created by I.K. and V.D. Extended Data Fig. 1 was prepared by A.M. with the support of an MPE Peer review information Nature thanks Jun Kataoka and Roland Crocker for their contribution
graphic design expert. K.N., A.M. and R.A.S. contributed to writing and editing the manuscript. to the peer review of this work. Peer reviewer reports are available.
The above-named authors all contributed to the discussion and interpretation of the results Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Schematic of the eROSITA and Fermi bubbles. approximate sizes of these structures, as derived from our analysis, are also
Schematic of the geometry of the eROSITA bubbles (EBs; yellow) and Fermi marked (green and purple arrows).
bubbles (FBs; purple) with respect to the Galaxy and the Solar System. The
Article
Extended Data Fig. 2 | Soft-X-ray data compared to a thick-shell model for northern bubble is spherical, with an outer radius of 7 kpc and an inner radius of
the eROSITA bubbles. Comparison between the thick-shell model (cyan line in 5 kpc. It is slightly offset from the vertical above the Galactic centre. The
Fig. 2) and eROSITA data (0.6–1.0-keV band) in a Lambert zenithal equal-area southern shell is instead an ellipse, slightly elongated in the north–south
projection. The model is in red; the data are in cyan. The northern bubble is direction (semi-major axis is 7 kpc; semi-minor axis 4.9 kpc).
shown on the left (N); the southern bubble is shown on the right (S). The
Article
Unveiling the strong interaction among

hadrons at the LHC
https://doi.org/10.1038/s41586-020-3001-6 ALICE Collaboration*
Received: 3 June 2020
Accepted: 20 October 2020 One of the key challenges for nuclear physics today is to understand from first
Published online: 9 December 2020 principles the effective interaction between hadrons with different quark content.
First successes have been achieved using techniques that solve the dynamics of
Open access
quarks and gluons on discrete space-time lattices1,2. Experimentally, the dynamics of
Check for updates the strong interaction have been studied by scattering hadrons off each other. Such
scattering experiments are difficult or impossible for unstable hadrons3–6 and so
high-quality measurements exist only for hadrons containing up and down quarks7.
Here we demonstrate that measuring correlations in the momentum space between
hadron pairs8–12 produced in ultrarelativistic proton–proton collisions at the CERN
Large Hadron Collider (LHC) provides a precise method with which to obtain the
missing information on the interaction dynamics between any pair of unstable
hadrons. Specifically, we discuss the case of the interaction of baryons containing
strange quarks (hyperons). We demonstrate how, using precision measurements of
proton–omega baryon correlations, the effect of the strong interaction for this
hadron–hadron pair can be studied with precision similar to, and compared with,
predictions from lattice calculations13,14. The large number of hyperons identified in
proton–proton collisions at the LHC, together with accurate modelling15 of the small
(approximately one femtometre) inter-particle distance and exact predictions for the
correlation functions, enables a detailed determination of the short-range part of the
nucleon-hyperon interaction.
Baryons are composite objects formed by three valence quarks of the deuteron20 and do not predict physical values for the masses
bound together by means of the strong interaction mediated of light hadrons21.
through the emission and absorption of gluons. Between baryons, Baryons containing strange (s) quarks, exclusively or combined with
the strong interaction leads to a residual force and the most common u and d quarks, are called hyperons (Y) and are denoted by uppercase
example is the effective strong force among nucleons (N)—baryons Greek letters: Λ = uds, Σ0 = uds, Ξ− = dss, Ω− = sss. Experimentally, little
composed of up (u) and down (d) quarks: proton (p) = uud and is known about Y–N and Y–Y interactions, but recently, major steps
neutron (n) = ddu. This force is responsible for the existence of a forward in their understanding have been made using lattice QCD
neutron–proton bound state, the deuteron, and manifests itself in approaches13,14,22. The predictions available for hyperons are character-
scattering experiments7 and through the existence of atomic nuclei. ized by smaller uncertainties because the lattice calculation becomes
So far, our understanding of the nucleon–nucleon strong interaction more stable for quarks with larger mass, such as the s quark. In particu-
relies heavily on effective theories16, where the degrees of freedom lar, robust results are obtained for interactions involving the heaviest
are nucleons. These effective theories are constrained by scattering hyperons, such as Ξ and Ω, and precise measurements of the p–Ξ− and
measurements and are successfully used in the description of p–Ω− interactions are instrumental in validating these calculations.
nuclear properties17,18. From an experimental point of view, the existence of nuclei in which
The fundamental theory of the strong interaction is quantum chromo- a nucleon is replaced by a hyperon (hypernuclei) demonstrates the
dynamics (QCD), in which quarks and gluons are the degrees of free- presence of an attractive strong Λ–N interaction23 and indicates the
dom. One of the current challenges in nuclear physics is to calculate possibility of binding a Ξ− to a nucleus24,25. A direct and more precise
the strong interaction among hadrons starting from first principles. measurement of the Y–N interaction requires scattering experiments,
Perturbative techniques are used to calculate strong-interaction which are particularly challenging to perform because hyperons are
phenomena in high-energy collisions with a level of precision of a short-lived and travel only a few centimetres before decaying. Previ-
few per cent19. For baryon–baryon interactions at low energy such ous experiments with Λ and Σ hyperons on proton targets3–5 delivered
techniques cannot be employed; however, numerical solutions on results that were two orders of magnitude less precise than those for
a finite space-time lattice have been used to calculate scattering nucleons, and such experiments with Ξ (ref. 6) and Ω beams are even
parameters among nucleons and the properties of light nuclei1,2. Such more challenging. The measurement of the Y–N and Y–Y interactions
approaches are still limited: they do not yet reproduce the properties has further important implications for the possible formation of a
*A list of members and their affiliations appears at the end of the paper.

a b Interaction d
Repulsive
Attractive
V(r*) (MeV)
0 Repulsive
Attractive
p2
C(k*)
r*
0 0.5 1.0 1.5 2.0 1
p1 r* (fm)
Emission source S(r*) Schrödinger equation 50 100 150 200

k* (MeV/c)
Two-particle wavefunction Correlation function
\ (k*, r*)
c
Nsame(k*)
C(k*, r*) = ∫ S(r*) \ (k*, r*) 2 d3r* = [(k*)
Nmixed(k*)
Fig. 1 | Schematic representation of the correlation method. a, A collision of wavefunction, ψ(k*, r*). c, The equation of the calculated (second term) and
two protons generates a particle source S(r*) from which a hadron–hadron pair measured (third term) correlation function C(k*), where Nsame(k*) and Nmixed(k*)
with momenta p1 and p2 emerges at a relative distance r* and can undergo a represent the k* distributions of hadron–hadron pairs produced in the same
final-state interaction before being detected. Consequently, the relative and in different collisions, respectively, and ξ(k*) denotes the corrections for
momentum k* is either reduced or increased via an attractive or a repulsive experimental effects. d, Sketch of the resulting shape of C(k*). The value of the
interaction, respectively. b, Example of attractive (green) and repulsive correlation function is proportional to the interaction strength. It is above
(dotted red) interaction potentials, V(r*), between two hadrons, as a function unity for an attractive (green) potential, and between zero and unity for a
of their relative distance. Given a certain potential, a non-relativistic repulsive (dotted red) potential.
Schrödinger equation is used to obtain the corresponding two-particle
Y–N or Y–Y bound state. Although numerous theoretical predictions In this work, we present a precision study of the most exotic among
exist13,26–30, so far no clear evidence for any such bound states has been the proton–hyperon interactions, obtained via the p–Ω− correlation
found, despite many experimental searches31–35. function in p–p collisions at a centre-of-mass energy s = 13 TeV at the
Additionally, a precise knowledge of the Y–N and Y–Y interactions LHC. The comparison of the measured correlation function with
has important consequences for the physics of neutron stars. Indeed, first-principle calculations13 and with a new precision measurement
the structure of the innermost core of neutron stars is still completely of the p–Ξ− correlation in the same collision system provides the first
unknown and hyperons could appear in such environments depending observation of the effect of the strong interaction for the p–Ω− pair.
on the Y–N and Y–Y interactions36. Real progress in this area calls for The implications of the measured correlations for a possible p–Ω−
new experimental methods. bound state are also discussed. These experimental results challenge
Studies of the Y–N interaction via correlations have been pioneered the interpretation of the data in terms of lattice QCD as the precision
by the HADES collaboration37. Recently, the ALICE Collaboration has of the data improves.
demonstrated that p–p and p–Pb collisions at the LHC are best suited Our measurement opens a new chapter for experimental methods
to study the N–N and several Y–N, Y–Y interactions precisely8–12. Indeed, in hadron physics with the potential to pin down the strong interaction
the collision energy and rate available at the LHC opens the phase for all known proton–hyperon pairs.
space for an abundant production of any strange hadron38, and the
capabilities of the ALICE detector for particle identification and the
momentum resolution—with values below 1% for transverse momentum Analysis of the correlation function
pT < 1 GeV/c—facilitate the investigation of correlations in momen- Figure 1 shows a schematic representation of the correlation method
tum space. These correlations reflect the properties of the interaction used in this analysis. The correlation function can be expressed theo-
and hence can be used to test theoretical predictions by solving the retically43,44 as C(k*) = ∫d3r*S(r*) × |ψ(k*, r*)|2, where k* and r* are the
Schrödinger equation for proton–hyperon collisions39. A fundamen- relative momentum and relative distance of the pair of interest. S(r*)
tal advantage of p–p and p–Pb collisions at LHC energies is the fact is the distribution of the distance r* = |r*| at which particles are emitted
that all hadrons originate from very small space-time volumes, with (defining the source size), ψ(k*, r*) represents the wavefunction of the
typical inter-hadron distances of about 1 fm. These small distances relative motion for the pair of interest and k* = |k*| is the reduced rela-
⁎
are linked through the uncertainty principle to a large range of the tive momentum of the pair (k = |p⁎2 − p⁎1 |/2). Given an interaction poten-
relative momentum (up to 200 MeV/c) for the baryon pair and enable tial between two hadrons as a function of their relative distance, a
us to test short-range interactions. Additionally, detailed modelling non-relativistic Schrödinger equation can be used39 to obtain the
of a common source for all produced baryons15 allow us to determine corresponding wavefunction and hence also predict the expected
accurately the source parameters. correlation function. The choice of a non-relativistic Schrödinger
Similar studies were carried out in ultrarelativistic Au–Au colli- equation is motivated by the fact that the typical relative momenta
sions at a centre-of-mass energy of 200 GeV per nucleon pair by the relevant for the strong final-state interaction have a maximal value of
STAR collaboration for Λ–Λ40,41 and p–Ω−42 interactions. This collision 200 MeV/c. Experimentally, this correlation function is computed as
system leads to comparatively large particle emitting sources of C(k*) = ξ(k*)[Nsame(k*)/Nmixed(k*)], where ξ(k*) denotes the corrections
3–5 fm. The resulting relative momentum range is below 40 MeV/c, for experimental effects, Nsame(k*) is the number of pairs with a given
implying reduced sensitivity to interactions at distances shorter k* obtained by combining particles produced in the same collision
than 1 fm. (event), which constitute a sample of correlated pairs, and Nmixed(k*) is

Article
30,000 a
3.5
ALICE data p−Ξ −
25,000 ALICE data
Signal + background fit
3.0 Coulomb
S– mΩ = 1.6725 GeV/c2
p
20,000 V = 1.82 MeV/c2 Coulomb + p − Ξ − HAL QCD
dN/dm (GeV/c2)–1
Background fit 2.5 Coulomb + p − 1 − HAL QCD elastic

Κ–
C(k*)
15,000 Coulomb + p − 1 − HAL QCD elastic + inelastic
Λ
2.0
10,000
Ω– 1.5
5,000
1.0
0
1.65 1.66 1.67 1.68 1.69 1.70
b
mΛΚ (GeV/c2) 7
p−Ω −
+
Fig. 2 | Reconstruction of the Ω and Ω̄ signals. Sketch of the weak decay
−
6
of Ω− into a Λ and a Κ−, and measured invariant mass distribution (blue points)
¯ + combinations. The dotted red line represents the fit to the data
of ΛΚ− and ΛK 1.2
5
including signal and background, and the black dotted line the background
C(k*)
alone. The contamination from misidentification is ≤5%.
C( k *)
1.0
4
the number of uncorrelated pairs with the same k*, obtained by com-
3 0.8
bining particles produced in different collisions (the so-called
mixed-event technique). Figure 1d shows how an attractive or repulsive 100 200
2 k* (MeV/c)
interaction is mapped into the correlation function. For an attractive
interaction the magnitude of the correlation function will be above
unity for small values of k*, whereas for a repulsive interaction it will 1
be between zero and unity. In the former case, the presence of a bound
0 100 200 300
state would create a depletion of the correlation function with a depth k* (MeV/c)
increasing with increasing binding energy.
Correlations can occur in nature from quantum mechanical inter- Fig. 3 | Experimental p–Ξ− and p–Ω− correlation functions. a, b, Measured
ference, resonances, conservation laws or final-state interactions. p–Ξ− (a) and p–Ω− (b) correlation functions in high multiplicity p–p collisions at
s = 13 TeV . The experimental data are shown as black symbols. The black
Here, it is the final-state interactions that contribute predominantly
vertical bars and the grey boxes represent the statistical and systematic
at low relative momentum; in this work we focus on the strong and
uncertainties. The square brackets show the bin width and the horizontal black
Coulomb interactions in pairs composed of a proton and either a Ξ− or
lines represent the statistical uncertainty in the determination of the mean k*
a Ω− hyperon.
for each bin. The measurements are compared with theoretical predictions,
Protons do not decay and can hence be directly identified within the shown as coloured bands, that assume either Coulomb or Coulomb + strong
ALICE detector, but Ξ− and Ω− baryons are detected through their weak HAL QCD interactions. For the p–Ω− system the orange band represents the
decays, Ξ− → Λ + π− and Ω− → Λ + Κ−. The identification and momentum prediction considering only the elastic contributions and the blue band
measurement of protons, Ξ−, Ω− and their respective antiparticles are represents the prediction considering both elastic and inelastic contributions.
described in Methods. Figure 2 shows a sketch of the Ω− decay and the The width of the curves including HAL QCD predictions represents the
invariant mass distribution of the ΛΚ− and ΛK ¯ + pairs. The clear peak uncertainty associated with the calculation (see Methods section ‘Corrections
+
corresponding to the rare Ω− and Ω̄ baryons demonstrates the excel- of the correlation function’ for details) and the grey shaded band represents, in
lent identification capability, which is the key ingredient for this meas- addition, the uncertainties associated with the determination of the source
urement. The contamination from misidentification is ≤5%. For the radius. The width of the Coulomb curves represents only the uncertainty
+ associated with the source radius. The considered radius values are 1.02 ± 0.05
Ξ− ( Ξ̄ ) baryon the misidentification amounts to 8%11.
fm for p–Ξ− and 0.95 ± 0.06 fm for p–Ω− pairs, respectively. The inset in b shows
Once the p, Ω− and Ξ− candidates and charge conjugates are selected
an expanded view of the p–Ω− correlation function for C(k*) close to unity. For
and their 3-momenta measured, the correlation functions can be built.
more details see text.
Since we assume that the same interaction governs baryon–baryon
and antibaryon–antibaryon pairs8, we consider in the following the
+
direct sum (⊕) of particles and antiparticles ( p – Ξ − ⊕ p¯ – Ξ¯ ≡ p – Ξ − is attractive and its effect on the correlation function is illustrated
+
− ¯ −
and p – Ω ⊕ p¯ – Ω ≡ p – Ω ). The determination of the correction ξ(k*) by the green curves in both panels of Fig. 3. These curves have been
and the evaluation of the systematic uncertainties are described in obtained by solving the Schrödinger equation for p–Ξ− and p–Ω− pairs
Methods. using the Correlation Analysis Tool using the Schrödinger equation
(CATS) equation solver39, considering only the Coulomb interaction and
assuming that the shape of the source follows a Gaussian distribution
Comparison of the p–Ξ− and p–Ω− interactions with a width equal to 1.02 ± 0.05 fm for the p–Ξ− system and to 0.95 ±
The obtained correlation functions are shown in Fig. 3a, b for the p–Ξ− 0.06 fm for the p–Ω− system, respectively. The source-size values have
and p–Ω− pairs, respectively, along with the statistical and systematic been determined via an independent analysis of p–p correlations15,
uncertainties. The fact that both correlations are well above unity where modifications of the source distribution due to strong decays
implies the presence of an attractive interaction for both systems. For of short-lived resonances are taken into account, and the source size
opposite-charge pairs, as considered here, the Coulomb interaction is determined as a function of the transverse mass mT of the pair, as

100 due to the presence of the bound state in the p–Ω− case46. If we con-
sider all four isospin and spin components of the p–Ξ− interaction11 the
0 prediction for the global p–Ξ− correlation function is lower than that
for p–Ω−. Experimentally, as shown in Fig. 3, the less attractive strong
p–Ξ – HAL QCD, I = 0, S = 0 p–Ξ− interaction translates into a correlation function that reaches
–100
p–Ω – HAL QCD, I = 1/2, S = 2 values of 3 in comparison with the much higher values of up to 6 that
V(r) (MeV)
10 are visible for the p–Ω− correlation. The theoretical predictions shown
–200
p–Ξ – HAL QCD, I = 0, S = 0 in Fig. 3 also include the effect of the Coulomb interaction.
p–Ξ – HAL QCD Regarding the p–Ξ− interaction, it should be considered that
–300 p–Ω – HAL QCD, I = 1/2, S = 2
strangeness-rearrangement processes can occur, such as pΞ− → ΛΛ, ΣΣ,
C(k*)
5
ΛΣ. This means that the inverse processes (for example, ΛΛ → pΞ−) can
–400 also occur and modify the p–Ξ− correlation function. These contribu-
0 tions are accounted for within lattice calculations by exploiting the well
–500 0 50 100 150 200 known quark symmetries14 and are found to be very small. Moreover,
k* (MeV)
the ALICE collaboration measured the Λ–Λ correlation in p–p and p–Pb
0 1 2
collisions10 and good agreement with the shallow interaction predicted
r (fm)
by the HAL QCD collaboration was found.
Fig. 4 | Potentials for the p–Ξ− and p–Ω− interactions. p–Ξ− (pink) and p–Ω− The resulting prediction for the correlation function, obtained by
(orange) interaction potentials as a function of the pair distance predicted by solving the Schrödinger equation for the single p–Ξ− channel includ-
the HAL QCD collaboration13,14. Only the most attractive component, isospin ing the HAL QCD strong and Coulomb interactions, is shown in Fig. 3a.
I = 0 and spin S = 0, is shown for p–Ξ−. For the p–Ω− interaction the I = 1/2 and spin The first measurement of the p–Ξ− interaction using p–Pb collisions11
S = 2 component is shown. The widths of the curves correspond to the showed a qualitative agreement to lattice QCD predictions. The
uncertainties (see Methods section ‘Corrections of the correlation function’
improved precision of the data in the current analysis of p–p collisions
for details) associated with the calculations. The inset shows the correlation
is also in agreement with calculations that include both the HAL QCD
functions obtained using the HAL QCD strong interaction potentials for: (i) the
and Coulomb interactions.
channel p–Ξ− with isospin I = 0 and spin S = 0, (ii) the channel p–Ξ− including all
allowed spin and isospin combinations (dashed pink), and (iii) the channel
p–Ω− with isospin I = 1/2 and spin S = 2. For details see text.
Detailed study of the p–Ω− correlation
Concerning the p–Ω− interaction, strangeness-rearrangement pro-
described in Methods. The average mT of the p–Ξ− and p–Ω− pairs are cesses can also occur47, such as pΩ− → ΞΛ, ΞΣ. Such processes might
1.9 GeV/c and 2.2 GeV/c, respectively. The difference in size between affect the p–Ω− interaction in a different way depending on the relative
the source of the p–Ξ− and p–Ω− pairs might reflect the contribution orientation of the total spin and angular momentum of the pair. Since
of collective effects such as (an)isotropic flow. The width of the green the proton has Jp = 1/2 and the Ω has JΩ = 3/2 and the orbital angular
curves in Fig. 3 reflects the quoted uncertainty of the measured source momentum L can be neglected for correlation studies that imply low
radius. The correlations obtained, accounting only for the Coulomb relative momentum, the total angular momentum J equals the total
interaction, considerably underestimate the strength of both measured spin S and can take on values of J = 2 or J = 1. The J = 2 state cannot couple
correlations. This implies, in both cases, that an attractive interaction to the strangeness-rearrangement processes discussed above, except
exists and exceeds the strength of the Coulomb interaction. through D-wave processes, which are strongly suppressed. For the
To discuss the comparison of the experimental data with the predic- J = 1 state only two limiting cases can be discussed in the absence of
tions from lattice QCD, it is useful to first focus on the distinct charac- measurements of the pΩ− → ΞΛ, ΞΣ cross-sections.
teristics of the p–Ξ− and p–Ω− interactions. Figure 4 shows the radial The first case assumes that the effect of the inelastic channels is
shapes obtained for the strong-interaction potentials calculated from negligible for both configurations and that the radial behaviour of the
first principles by the HAL QCD (Hadrons to Atomic nuclei from Lat- interaction is driven by elastic processes, following the lattice QCD
tice QCD) collaboration for the p–Ξ− (ref. 14) and the p–Ω− systems13, potential (see Fig. 4), for both the J = 2 and J = 1 channels. This results in
see Methods for details. Only the most attractive (isospin I = 0 and a prediction, shown by the orange curve in Fig. 3b, that is close to the
spin S = 0) of the four components14 of the p–Ξ− interaction and the data in the low k* region. The second limiting case assumes, follow-
isospin I = 1/2 and spin S = 2 component of the p–Ω− interaction are ing a previous prescription47, that the J = 1 configuration is completely
shown. Aside from an attractive component, we see that the interac- dominated by strangeness-rearrangement processes. The obtained
tion contains also a repulsive core starting at very small distances, correlation function is shown by the blue curve Fig. 3b. This curve clearly
below 0.2 fm. For the p–Ω− system no repulsive core is visible and the deviates from the data. Both theoretical calculations also include the
interaction is purely attractive. This very attractive interaction can effect of the Coulomb interaction and they predict the existence of a
accommodate a p–Ω− bound state, with a binding energy of about p–Ω− bound state with a binding energy of 2.5 MeV, which causes a deple-
2.5 MeV, considering the Coulomb and strong forces13. The p–Ξ− and tion in the correlation function in the k* region between 100 and 300
p–Ω− interaction potentials look very similar to each other above a MeV/c, because pairs that form a bound state are lost to the correlation
distance of 1 fm. This behaviour is not observed in phenomenologi- yield. The inset of Fig. 3 shows that in this k* region the data are consist-
cal models that engage the exchange of heavy mesons and predict a ent with unity and do not follow either of the two theoretical predictions.
quicker fall off of the potentials45. At the moment, the lattice QCD predictions underestimate the data,
The inset of Fig. 4 shows the correlation functions obtained using the but additional measurements are necessary to draw a firm conclu-
HAL QCD strong interaction potentials for: (i) the channel p–Ξ− with sion on the existence of the bound state. Measurements of Λ–Ξ− and
isospin I = 0 and spin S = 0, (ii) the channel p–Ξ− including all allowed Σ0–Ξ− correlations will verify experimentally the strength of possible
spin and isospin combinations, and (iii) the channel p–Ω− with isospin non-elastic contributions. Measurements of the p–Ω− correlation func-
I = 1/2 and spin S = 2. The correlation functions are computed using the tion in collision systems with slightly larger size (for example, p–Pb
experimental values for the p–Ξ− and p–Ω− source-size. Despite the fact collisions at the LHC)11 will clarify the possible presence of a deple-
that the strong p–Ω− potential is more attractive than the p–Ξ− I = 0 tion in C(k*). Indeed, the appearance of a depletion in the correlation
and S = 0 potential, the resulting correlation function is lower. This is function depends on the interplay between the average intra-particle

Article
distance (source size) and the scattering length associated with the 18. Gebrerufael, E., Vobig, K., Hergert, H. & Roth, R. Ab initio description of open-shell nuclei:
merging no-core shell model and in-medium similarity renormalization group. Phys. Rev.
p–Ω− interaction47. Lett. 118, 152503 (2017).
19. Klijnsma, T., Bethke, S., Dissertori, G. & Salam, G. P. Determination of the strong coupling
constant αs(mZ) from measurements of the total cross section for top-antitop quark
Summary production. Eur. Phys. J. C 77, 778 (2017).
20. NPLQCD Collaboration. Two nucleon systems at mπ ~ 450 MeV from lattice QCD. Phys.
We have shown that the hyperon–proton interaction can be studied in Rev. D 92, 114512 (2015).
unprecedented detail in p–p collisions at s = 13 TeV at the LHC. We 21. Wagman, M. L. et al. Baryon–baryon interactions and spin-flavor symmetry from lattice
quantum chromodynamics. Phys. Rev. D 96, 114510 (2017).
have demonstrated, in particular, that even the as-yet-unknown p–Ω− 22. Hatsuda, T. Lattice quantum chromodynamics and baryon-baryon interactions. Front.
interaction can be investigated with excellent precision. The com- Phys. 13, 132105 (2018).
parison of the measured correlation functions shows that the 23. Hashimoto, O. & Tamura, H. Spectroscopy of Λ hypernuclei. Prog. Part. Nucl. Phys. 57,
564–653 (2006).
p–Ω− signal is up to a factor two larger than the p–Ξ− signal. This reflects 24. Nakazawa, K. et al. The first evidence of a deeply bound state of Xi−–14N system. Prog.
the large difference in the strong-attractive interaction predicted by Theor. Exp. Phys. 2015, 033D02 (2015).
the first-principle calculations by the HAL QCD collaboration. The 25. Nagae, T. et al. Search for a Ξ bound state in the 12C(K−, K+)X reaction at 1.8 GeV/c. PoS
(INPC2016) 038 (2017).
correlation functions predicted by HAL QCD are in agreement with the 26. Francis, A. et al. Lattice QCD study of the H dibaryon using hexaquark and two-baryon
measurements for the p–Ξ− interaction. For the p–Ω− interaction, interpolators. Phys. Rev. D 99, 074505, (2019).
the inelastic channels are not yet accounted for quantitatively within 27. Jaffe, R. L. Perhaps a stable dihyperon. Phys. Rev. Lett. 38, 195–198 (1977); erratum 38, 617
(1977).
the lattice QCD calculations. Additionally, the depletion in the correla- 28. Nagels, M. M., Rijken, T. A. & de Swart, J. J. Baryon–baryon scattering in a
tion function that is visible in the calculations around k* = 150 MeV/c, one-boson-exchange-potential approach. II. Hyperon–nucleon scattering. Phys. Rev. D
owing to the presence of a p–Ω− bound state, is not observed in the meas- 15, 2547–2564 (1977).
29. Nagels, M. M., Rijken, T. A. & de Swart, J. J. Baryon–baryon scattering in a
ured correlation. To draw quantitative conclusions concerning the exist- one-boson-exchange-potential approach. III. A nucleon–nucleon and hyperon–nucleon
ence of a p–Ω− bound state, we plan a direct measurement of the Λ–Ξ− and analysis including contributions of a nonet of scalar mesons. Phys. Rev. D 20, 1633–1645
Σ0−Ξ− correlations and a study of the p–Ω− correlation in p–Pb collisions (1979).
30. Gongyo, S. et al. Most strange dibaryon from lattice QCD. Phys. Rev. Lett. 120, 212001
in the near future. Indeed, with the upgraded ALICE apparatus48 and the (2018).
increased data sample size expected from the high luminosity phase of 31. ALICE Collaboration. Search for weakly decaying Λn and ΛΛ exotic bound states in
central Pb–Pb collisions at √sNN = 2.76 TeV. Phys. Lett. B 752, 267–277 (2016).
the LHC Run 3 and Run 449, the missing interactions involving hyperons
32. Belle Collaboration. Search for an H-dibaryon with mass near 2mΛ in ϒ(1S) and ϒ(2S)
will be measured in p–p and p–Pb collisions and this should enable us to decays. Phys. Rev. Lett. 110, 222002 (2013).
answer the question about the existence of a new baryon–baryon bound 33. Chrien, R. H particle searches at Brookhaven. Nucl. Phys. A 629, 388–397 (1998).
34. Yoon, C. et al. Search for the H-dibaryon resonance in 12C (K−, K+ΛΛX). Phys. Rev. C 75,
state. Since this method can be extended to almost any hadron–hadron
022201 (2007).
pair, an unexpected avenue for high-precision tests of the strong inter- 35. KEK-PS E224 Collaboration. Enhanced ΛΛ production near threshold in the 12C(K−, K+)
action at the LHC has been opened. reaction. Phys. Lett. B 444, 267–272 (1998).
36. Tolos, L. & Fabbietti, L. Strangeness in nuclei and neutron stars. Prog. Part. Nucl. Phys. 112,
103770 (2020).
37. HADES Collaboration. The Λp interaction studied via femtoscopy in p + Nb reactions at
Online content √sNN = 3.18 GeV. Phys. Rev. C 94, 025201 (2016).
38. ALICE Collaboration. Enhanced production of multi-strange hadrons in high-multiplicity
Any methods, additional references, Nature Research reporting sum-
proton–proton collisions. Nat. Phys. 13, 535–539 (2017).
maries, source data, extended data, supplementary information, 39. Mihaylov, D. et al. A femtoscopic correlation analysis tool using the Schrödinger equation
acknowledgements, peer review information; details of author con- (CATS). Eur. Phys. J. C 78, 394 (2018).
40. STAR Collaboration. ΛΛ correlation function in Au+Au collisions at √sNN = 200 GeV. Phys.
tributions and competing interests; and statements of data and code
Rev. Lett. 114, 022301 (2015).
availability are available at https://doi.org/10.1038/s41586-020-3001-6. 41. Morita, K., Furumoto, T. & Ohnishi, A. ΛΛ interaction from relativistic heavy-ion collisions.
Phys. Rev. C 91, 024916 (2015).
1. NPLQCD Collaboration. Nucleon–nucleon scattering parameters in the limit of SU(3) 42. STAR Collaboration. The proton–Ω correlation function in Au + Au collisions at
flavor symmetry. Phys. Rev. C 88, 024003 (2013). √sNN = 200 GeV. Phys. Lett. B 790, 490–497 (2019).
2. Epelbaum, E., Krebs, H., Lee, D. & Meissner, U.-G. Lattice effective field theory calculations 43. Pratt, S. Pion interferometry of quark–gluon plasma. Phys. Rev. D 33, 1314–1327 (1986).
for A = 3, 4, 6, 12 nuclei. Phys. Rev. Lett. 104, 142501 (2010). 44. Lisa, M. A., Pratt, S., Soltz, R. & Wiedemann, U. Femtoscopy in relativistic heavy ion
3. Eisele, F., Filthuth, H., Foehlisch, W., Hepp, V. & Zech, G. Elastic Σ±p scattering at low collisions. Annu. Rev. Nucl. Part. Sci. 55, 357–402 (2005).
energies. Phys. Lett. B 37, 204–206 (1971). 45. Haidenbauer, J. & Meißner, U.-G. Phenomenological view on baryon–baryon potentials
4. Alexander, G. et al. Study of the Λ–N system in low-energy Λ–p elastic scattering. Phys. from lattice QCD simulations. Eur. Phys. J. A 55, 70 (2019).
Rev. 173, 1452–1460 (1968). 46. Morita, K., Ohnishi, A., Etminan, F. & Hatsuda, T. Probing multistrange dibaryons with
5. Sechi-Zorn, B., Kehoe, B., Twitty, J. & Burnstein, R. Low-energy Λ–proton elastic scattering. proton–omega correlations in high-energy heavy ion collisions. Phys. Rev. C 94, 031901
Phys. Rev. 175, 1735–1740 (1968). (2016); erratum 100, 069902 (2019).
6. Muller, R. Observation of cascade hyperon interactions. Phys. Lett. B 38, 123–124 (1972). 47. Morita, K. et al. Probing ΩΩ and pΩ dibaryons with femtoscopic correlations in relativistic
7. Stoks, V. & de Swart, J. Comparison of potential models with the pp scattering data below heavy-ion collisions. Phys. Rev. C 101, 015201 (2020).
350 MeV. Phys. Rev. C 47, 761–767 (1993). 48. ALICE Collaboration. Upgrade of the ALICE experiment: letter of intent. J. Phys. G 41,
8. ALICE Collaboration. p–p, p–Λ and Λ–Λ correlations studied via femtoscopy in pp 087001 (2014).
reactions at √s = 7 TeV. Phys. Rev. C 99, 024001 (2019). 49. Citron, Z. et al. Report from Working Group 5: future physics opportunities for
9. ALICE Collaboration. Scattering studies with low-energy kaon–proton femtoscopy in high-density QCD at the LHC with heavy-ion and proton beams. CERN Yellow Rep.
proton–proton collisions at the LHC. Phys. Rev. Lett. 124, 092301 (2020). Monogr. 7, 1159–1410 (2019).
10. ALICE Collaboration. Study of the Λ–Λ interaction with femtoscopy correlations in pp and
p–Pb collisions at the LHC. Phys. Lett. B 797, 134822 (2019). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
11. ALICE Collaboration. First observation of an attractive interaction between a proton and a published maps and institutional affiliations.
cascade baryon. Phys. Rev. Lett. 123, 112002 (2019).
12. ALICE Collaboration. Investigation of the p–Σ0 interaction via femtoscopy in pp collisions. Open Access This article is licensed under a Creative Commons Attribution
Phys. Lett. B 805, 135419 (2020). 4.0 International License, which permits use, sharing, adaptation, distribution
13. HAL QCD Collaboration. NΩ dibaryon from lattice QCD near the physical point. Phys. and reproduction in any medium or format, as long as you give appropriate
Lett. B 792, 284–289 (2019). credit to the original author(s) and the source, provide a link to the Creative Commons license,
14. Sasaki, K. et al. ΛΛ and NΞ interactions from lattice QCD near the physical point. Nucl. and indicate if changes were made. The images or other third party material in this article are
Phys. A 998, 121737 (2020). included in the article’s Creative Commons license, unless indicated otherwise in a credit line
15. ALICE Collaboration. Search for a common baryon source in high-multiplicity pp to the material. If material is not included in the article’s Creative Commons license and your
collisions at the LHC. Phys. Lett. B 811, 135849 (2020). intended use is not permitted by statutory regulation or exceeds the permitted use, you will
16. Epelbaum, E., Hammer, H.-W. & Meissner, U.-G. Modern theory of nuclear forces. Rev. need to obtain permission directly from the copyright holder. To view a copy of this license,
Mod. Phys. 81, 1773–1825 (2009). visit http://creativecommons.org/licenses/by/4.0/.
17. Hebeler, K., Holt, J., Menendez, J. & Schwenk, A. Nuclear forces and their impact on
neutron-rich nuclei and neutron-rich matter. Annu. Rev. Nucl. Part. Sci. 65, 457–484 (2015). © The Author(s) 2020

ALICE Collaboration✉ Luparello93, Y. G. Ma15, A. Maevskaya105, M. Mager6, S. M. Mahmood36, T. Mahmoud42, A.
Maire60, R. D. Majka57,153, M. Malaev62, Q. W. Malik36, L. Malinina53,150, D. Mal’Kevich13, P.
S. Acharya1, D. Adamová2, A. Adler3, J. Adolfsson4, M. M. Aggarwal5, G. Aglieri Rinella6, M. Malzacher14, G. Mandaglio38,133, V. Manko17, F. Manso49, V. Manzari91, Y. Mao19, M.
Agnello7, N. Agrawal8,9, Z. Ahammed1, S. Ahmad10, S. U. Ahn11, Z. Akbar12, A. Akindinov13, M. Marchisone88, J. Mareš134, G. V. Margagliotti85, A. Margotti9, A. Marín14, C. Markert68, M.
Al-Turany14, S. N. Alam1,15, D. S. D. Albuquerque16, D. Aleksandrov17, B. Alessandro18, H. M. Marquard26, C. D. Martin85, N. A. Martin30, P. Martinengo6, J. L. Martinez23, M. I. Martínez84, G.
Alfanda19, R. Alfaro Molina20, B. Ali10, Y. Ali21, A. Alici8,9,22, N. Alizadehvandchali23, A. Alkin6,24, Martínez García35, S. Masciocchi14, M. Masera46, A. Masoni86, L. Massacrier37, E. Masson35, A.
J. Alme25, T. Alt26, L. Altenkamper25, I. Altsybeev27, M. N. Anaam19, C. Andrei28, D. Andreou6, A. Mastroserio91,135, A. M. Mathis43, O. Matonoha4, P. F. T. Matuoka74, A. Matyja81, C. Mayer81, F.
Andronic29, M. Angeletti6, V. Anguelov30, C. Anson31, T. Antičić32, F. Antinori33, P. Antonioli9, Mazzaschi46, M. Mazzilli91, M. A. Mazzoni136, A. F. Mechler26, F. Meddi137, Y. Melikyan61,105, A.
N. Apadula34, L. Aphecetche35, H. Appelshäuser26, S. Arcelli22, R. Arnaldi18, M. Arratia34, I. C. Menchaca-Rocha20, C. Mengke19, E. Meninno78,99, A. S. Menon23, M. Meres138, S. Mhlanga90, Y.
Arsene36, M. Arslandok30, A. Augustinus6, R. Averbeck14, S. Aziz37, M. D. Azmi10, A. Badalà38, Y. Miake89, L. Micheletti46, L. C. Migliorin88, D. L. Mihaylov43, K. Mikhaylov13,53, A. N. Mishra96, D.
W. Baek39, S. Bagnasco18, X. Bai14, R. Bailhache26, R. Bala40, A. Balbino7, A. Baldisseri41, M. Miśkowiec14, A. Modak44, N. Mohammadi6, A. P. Mohanty58, B. Mohanty98, M. Mohisin
Ball42, S. Balouza43, D. Banerjee44, R. Barbera45, L. Barioglio46, G. G. Barnaföldi47, L. S. Khan10,151, Z. Moravcova56, C. Mordasini43, D. A. Moreira De Godoy29, L. A. P. Moreno84, I.
Barnby48, V. Barret49, P. Bartalini19, C. Bartels50, K. Barth6, E. Bartsch26, F. Baruffaldi51, N. Morozov105, A. Morsch6, T. Mrnjavac6, V. Muccifora66, E. Mudnic113, D. Mühlheim29, S. Muhuri1,
Bastid49, S. Basu52, G. Batigne35, B. Batyunya53, D. Bauri54, J. L. Bazo Alba55, I. G. Bearden56, C. J. D. Mulligan34, A. Mulliri71,86, M. G. Munhoz74, R. H. Munzer26, H. Murakami115, S. Murray90, L.
Beattie57, C. Bedda58, N. K. Behera59, I. Belikov60, A. D. C. Bell Hechavarria29, F. Bellini6, R. Musa6 ✉, J. Musinsky121, C. J. Myers23, J. W. Myrcha97, B. Naik54, R. Nair102, B. K. Nandi54, R.
Bellwied23, V. Belyaev61, G. Bencedi47, S. Beole46, A. Bercuci28, Y. Berdnikov62, D. Berenyi47, R. Nania8,9, E. Nappi91, M. U. Naru21, A. F. Nassirpour4, C. Nattrass63, R. Nayak54, T. K. Nayak98, S.
A. Bertens63, D. Berzano18, M. G. Besoiu64, L. Betev6, A. Bhasin40, I. R. Bhat40, M. A. Bhat44, H. Nazarenko77, A. Neagu36, R. A. Negrao De Oliveira26, L. Nellen96, S. V. Nesbo117, G. Neskovic100,
Bhatt54, B. Bhattacharjee65, A. Bianchi46, L. Bianchi46, N. Bianchi66, J. Bielčík67, J. Bielčíková2, D. Nesterov27, L. T. Neumann97, B. S. Nielsen56, S. Nikolaev17, S. Nikulin17, V. Nikulin62, F.
A. Bilandzic43, G. Biro47, R. Biswas44, S. Biswas44, J. T. Blair68, D. Blau17, C. Blume26, G. Boca69, F. Noferini8,9, P. Nomokonov53, J. Norman50,92, N. Novitzky89, P. Nowakowski97, A. Nyanin17, J.
Bock70, A. Bogdanov61, S. Boi71, J. Bok59, L. Boldizsár47, A. Bolozdynya61, M. Bombara72, G. Nystrand25, M. Ogino116, A. Ohlson4,30, J. Oleniacz97, A. C. Oliveira Da Silva63, M. H. Oliver57, C.
Bonomi73, H. Borel41, A. Borissov61, H. Bossi57, E. Botta46, L. Bratrud26, P. Braun-Munzinger14, Oppedisano18, A. Ortiz Velasquez96, A. Oskarsson4, J. Otwinowski81, K. Oyama116, Y.
M. Bregant74, M. Broz67, E. Bruna18, G. E. Bruno75,76, M. D. Buckland50, D. Budnikov77, H. Pachmayer30, V. Pacik56, S. Padhan54, D. Pagano73, G. Paić96, J. Pan52, S. Panebianco41, P.
Buesching26, S. Bufalino7, O. Bugnon35, P. Buhler78, P. Buncic6, Z. Buthelezi79,80, J. B. Butt21, S. Pareek1,101, J. Park59, J. E. Parkkila125, S. Parmar5, S. P. Pathak23, B. Paul71, J. Pazzini73, H. Pei19, T.
A. Bysiak81, D. Caffarri82, A. Caliva14, E. Calvo Villar55, J. M. M. Camacho83, R. S. Camacho84, P. Peitzmann58, X. Peng19, L. G. Pereira112, H. Pereira Da Costa41, D. Peresunko17, G. M. Perez103, S.
Camerini85, F. D. M. Canedo74, A. A. Capon78, F. Carnesecchi22, R. Caron41, J. Castillo Perrin41, Y. Pestov139, V. Petráček67, M. Petrovici28, R. P. Pezzi112, S. Piano93, M. Pikna138, P.
Castellanos41, A. J. Castro63, E. A. R. Casula86, F. Catalano7, C. Ceballos Sanchez53, P. Pillot35, O. Pinazza6,9, L. Pinsky23, C. Pinto45, S. Pisano8,66, D. Pistone38, M. Płoskoń34, M.
Chakraborty54, S. Chandra1, W. Chang19, S. Chapeland6, M. Chartier50, S. Chattopadhyay1, S. Planinic106, F. Pliquett26, M. G. Poghosyan70, B. Polichtchouk108, N. Poljak106, A. Pop28, S.
Chattopadhyay87, A. Chauvin71, C. Cheshkov88, B. Cheynis88, V. Chibante Barroso6, D. D. Porteboeuf-Houssais49, V. Pozdniakov53, S. K. Prasad44, R. Preghenella9, F. Prino18, C. A.
Chinellato16, S. Cho59, P. Chochula6, T. Chowdhury49, P. Christakoglou82, C. H. Christensen56, Pruneau52, I. Pshenichnov105, M. Puccio6, J. Putschke52, S. Qiu82, L. Quaglia46, R. E. Quishpe23,
P. Christiansen4, T. Chujo89, C. Cicalo86, L. Cifarelli8,22, L. D. Cilladi46, F. Cindolo9, M. R. S. Ragoni107, S. Raha44, S. Rajput40, J. Rak125, A. Rakotozafindrabe41, L. Ramello94, F. Rami60, S.
Ciupek14, G. Clai9,148, J. Cleymans90, F. Colamaria91, D. Colella91, A. Collu34, M. Colocci22, M. A. R. Ramirez84, R. Raniwala140, S. Raniwala140, S. S. Räsänen141, R. Rath101, V. Ratza42, I.
Concas18,149, G. Conesa Balbastre92, Z. Conesa del Valle37, G. Contin85,93, J. G. Contreras67, T. Ravasenga82, K. F. Read63,70, A. R. Redelbach100, K. Redlich102,152, A. Rehman25, P. Reichelt26, F.
M. Cormier70, Y. Corrales Morales46, P. Cortese94, M. R. Cosentino95, F. Costa6, S. Costanza69, Reidt6, X. Ren19, R. Renfordt26, Z. Rescakova72, K. Reygers30, A. Riabov62, V. Riabov62, T.
P. Crochet49, E. Cuautle96, P. Cui19, L. Cunqueiro70, D. Dabrowski97, T. Dahms43, A. Dainese33, F. Richert4,56, M. Richter36, P. Riedler6, W. Riegler6, F. Riggi45, C. Ristea64, S. P. Rode101, M.
P. A. Damas35,41, M. C. Danisch30, A. Danu64, D. Das87, I. Das87, P. Das98, P. Das44, S. Das44, A. Rodríguez Cahuantzi84, K. Røed36, R. Rogalev108, E. Rogochaya53, D. Rohr6, D. Röhrich25, P. F.
Dash98, S. Dash54, S. De98, A. De Caro99, G. de Cataldo91, J. de Cuveland100, A. De Falco71, D. De Rojas84, P. S. Rokita97, F. Ronchetti66, A. Rosano38, E. D. Rosas96, K. Roslon97, A. Rossi33,51, A.
Gruttola8, N. De Marco18, S. De Pasquale99, S. Deb101, H. F. Degenhardt74, K. R. Deja97, A. Rotondi69, A. Roy101, P. Roy87, O. V. Rueda4, R. Rui85, B. Rumyantsev53, A. Rustamov111, E.
Deloff102, S. Delsanto46,80, W. Deng19, P. Dhankher54, D. Di Bari75, A. Di Mauro6, R. A. Diaz103, T. Ryabinkin17, Y. Ryabov62, A. Rybicki81, H. Rytkonen125, O. A. M. Saarimaki141, R. Sadek35, S.
Dietel90, P. Dillenseger26, Y. Ding19, R. Divià6, D. U. Dixit104, Ø. Djuvsland25, U. Dmitrieva105, A. Sadhu1, S. Sadovsky108, K. Šafařík67, S. K. Saha1, B. Sahoo54, P. Sahoo54, R. Sahoo101, S.
Dobrin64, B. Dönigus26, O. Dordic36, A. K. Dubey1, A. Dubla14,82, S. Dudi5, M. Dukhishyam98, P. Sahoo142, P. K. Sahu142, J. Saini1, S. Sakai89, S. Sambyal40, V. Samsonov61,62, D. Sarkar52, N.
Dupieux49, R. J. Ehlers70, V. N. Eikeland25, D. Elia91, B. Erazmus35, F. Erhardt106, A. Erokhin27, M. Sarkar1, P. Sarma65, V. M. Sarti43, M. H. P. Sas58, E. Scapparone9, J. Schambach68, H. S.
R. Ersdal25, B. Espagnon37, G. Eulisse6, D. Evans107, S. Evdokimov108, L. Fabbietti43, M. Faggin51, Scheid26, C. Schiaua28, R. Schicker30, A. Schmah30, C. Schmidt14, H. R. Schmidt143, M. O.
J. Faivre92, F. Fan19, A. Fantoni66, M. Fasel70, P. Fecchio7, A. Feliciello18, G. Feofilov27, A. Schmidt30, M. Schmidt143, N. V. Schmidt26,70, A. R. Schmier63, J. Schukraft56, Y. Schutz60, K.
Fernández Téllez84, A. Ferrero41, A. Ferretti46, A. Festanti6, V. J. G. Feuillard30, J. Figiel81, S. Schwarz14, K. Schweda14, G. Scioli22, E. Scomparin18, J. E. Seger31, Y. Sekiguchi115, D.
Filchagin77, D. Finogeev105, F. M. Fionda25, G. Fiorenza91, F. Flor23, A. N. Flores68, S. Foertsch79, Sekihata115, I. Selyuzhenkov14,61, S. Senyukov60, D. Serebryakov105, A. Sevcenco64, A.
P. Foka14, S. Fokin17, E. Fragiacomo93, U. Frankenfeld14, U. Fuchs6, C. Furget92, A. Furs105, M. Shabanov105, A. Shabetai35, R. Shahoyan6, W. Shaikh87, A. Shangaraev108, A. Sharma5, A.
Fusco Girard99, J. J. Gaardhøje56, M. Gagliardi46, A. M. Gago55, A. Gal60, C. D. Galvan83, P. Sharma40, H. Sharma81, M. Sharma40, N. Sharma5, S. Sharma40, O. Sheibani23, K. Shigaki144, M.
Ganoti109, C. Garabatos14, J. R. A. Garcia84, E. Garcia-Solis110, K. Garg35, C. Gargiulo6, A. Shimomura145, S. Shirinkin13, Q. Shou15, Y. Sibiriak17, S. Siddhanta86, T. Siemiarczuk102, D.
Garibli111, K. Garner29, P. Gasik14,43, E. F. Gauger68, M. B. Gay Ducati112, M. Germain35, J. Ghosh87, Silvermyr4, G. Simatovic82, G. Simonetti6, B. Singh43, R. Singh98, R. Singh40, R. Singh101, V. K.
P. Ghosh1, S. K. Ghosh44, M. Giacalone22, P. Gianotti66, P. Giubellino14,18, P. Giubilato51, A. M. C. Singh1, V. Singhal1, T. Sinha87, B. Sitar138, M. Sitta94, T. B. Skaali36, M. Slupecki141, N. Smirnov57,
Glaenzer41, P. Glässel30, A. Gomez Ramirez3, V. Gonzalez14,52, L. H. González-Trueba20, S. R. J. M. Snellings58, C. Soncco55, J. Song23, A. Songmoolnak130, F. Soramel51, S. Sorensen63, I.
Gorbunov100, L. Görlich81, A. Goswami54, S. Gotovac113, V. Grabski20, L. K. Graczykowski97, K. Sputowska81, J. Stachel30, I. Stan64, P. J. Steffanic63, E. Stenlund4, S. F. Stiefelmaier30, D.
L. Graham107, L. Greiner34, A. Grelli58, C. Grigoras6, V. Grigoriev61, A. Grigoryan114, S. Stocco35, M. M. Storetvedt117, L. D. Stritto99, A. A. P. Suaide74, T. Sugitate144, C. Suire37, M.
Grigoryan53, O. S. Groettvik25, F. Grosa7,18, J. F. Grosse-Oetringhaus6, R. Grosso14, R. Suleymanov21, M. Suljic6, R. Sultanov13, M. Šumbera2, V. Sumberia40, S. Sumowidagdo12, S.
Guernane92, M. Guittiere35, K. Gulbrandsen56, T. Gunji115, A. Gupta40, R. Gupta40, I. B. Swain142, A. Szabo138, I. Szarka138, U. Tabassam21, S. F. Taghavi43, G. Taillepied49, J. Takahashi16,
Guzman84, R. Haake57, M. K. Habib14, C. Hadjidakis37, H. Hamagaki116, G. Hamar47, M. Hamid19, G. J. Tambave25, S. Tang19,49, M. Tarhini35, M. G. Tarzila28, A. Tauro6, G. Tejeda Muñoz84, A.
R. Hannigan68, M. R. Haque58,98, A. Harlenderova14, J. W. Harris57, A. Harton110, J. A. Telesca6, L. Terlizzi46, C. Terrevoli23, D. Thakur101, S. Thakur1, D. Thomas68, F. Thoresen56, R.
Hasenbichler6, H. Hassan70, Q. U. Hassan21, D. Hatzifotiadou8,9, P. Hauer42, L. B. Havener57, S. Tieulent88, A. Tikhonov105, A. R. Timmins23, A. Toia26, N. Topilskaya105, M. Toppi66, F.
Hayashi115, S. T. Heckel43, E. Hellbär26, H. Helstrup117, A. Herghelegiu28, T. Herman67, E. G. Torales-Acosta104, S. R. Torres67, A. Trifiró38,133, S. Tripathy96,101, T. Tripathy54, S. Trogolo51, G.
Hernandez84, G. Herrera Corral118, F. Herrmann29, K. F. Hetland117, H. Hillemanns6, C. Hills50, B. Trombetta75, L. Tropp72, V. Trubnikov24, W. H. Trzaska125, T. P. Trzcinski97, B. A. Trzeciak58,67, A.
Hippolyte60, B. Hohlweger43, J. Honermann29, D. Horak67, A. Hornung26, S. Hornung14, R. Tumkin77, R. Turrisi33, T. S. Tveter36, K. Ullaland25, E. N. Umaka23, A. Uras88, G. L. Usai71, M.
Hosokawa31,89, P. Hristov6, C. Huang37, C. Hughes63, P. Huhn26, T. J. Humanic119, H. Hushnud87, Vala72, N. Valle69, S. Vallero18, N. van der Kolk58, L. V. R. van Doremalen58, M. van Leeuwen58, P.
L. A. Husova29, N. Hussain65, S. A. Hussain21, D. Hutter100, J. P. Iddon6,50, R. Ilkaev77, H. Ilyas21, Vande Vyvre6, D. Varga47, Z. Varga47, M. Varga-Kofarago47, A. Vargas84, M. Vasileiou109, A.
M. Inaba89, G. M. Innocenti6, M. Ippolitov17, A. Isakov2, M. S. Islam87, M. Ivanov14, V. Ivanov62, V. Vasiliev17, O. Vázquez Doce43, V. Vechernin27, E. Vercellin46, S. Vergara Limón84, L. Vermunt58,
Izucheev108, B. Jacak34, N. Jacazio6,9, P. M. Jacobs34, S. Jadlovska120, J. Jadlovsky120, S. R. Vernet146, R. Vértesi47, L. Vickovic113, Z. Vilakazi80, O. Villalobos Baillie107, G. Vino91, A.
Jaelani58, C. Jahnke74, M. J. Jakubowska97, M. A. Janik97, T. Janson3, M. Jercic106, O. Jevons107, Vinogradov17, T. Virgili99, V. Vislavicius56, A. Vodopyanov53, B. Volkel6, M. A. Völkl143, K.
M. Jin23, F. Jonas29,70, P. G. Jones107, J. Jung26, M. Jung26, A. Jusko107, P. Kalinak121, A. Kalweit6, V. Voloshin13, S. A. Voloshin52, G. Volpe75, B. von Haller6, I. Vorobyev43, D. Voscek120, J.
Kaplin61, S. Kar19, A. Karasu Uysal122, D. Karatovic106, O. Karavichev105, T. Karavicheva105, P. Vrláková72, B. Wagner25, M. Weber78, S. G. Weber29, A. Wegrzynek6, S. C. Wenzel6, J. P.
Karczmarczyk97, E. Karpechev105, A. Kazantsev17, U. Kebschull3, R. Keidel123, M. Keil6, B. Wessels29, J. Wiechula26, J. Wikne36, G. Wilk102, J. Wilkinson8,9, G. A. Willems29, E. Willsher107,
Ketzer42, Z. Khabanova82, A. M. Khan19, S. Khan10, A. Khanzadeev62, Y. Kharlov108, A. Khatun10, B. Windelband30, M. Winn41, W. E. Witt63, J. R. Wright68, Y. Wu147, R. Xu19, S. Yalcin122, Y.
A. Khuntia81, B. Kileng117, B. Kim59, B. Kim89, D. Kim124, D. J. Kim125, E. J. Kim126, H. Kim127, J. Yamaguchi144, K. Yamakawa144, S. Yang25, S. Yano41, Z. Yin19, H. Yokoyama58, I.-K. Yoo127, J. H.
Kim124, J. S. Kim39, J. Kim30, J. Kim124, J. Kim126, M. Kim30, S. Kim128, T. Kim124, T. Kim124, S. Yoon59, S. Yuan25, A. Yuncu30, V. Yurchenko24, V. Zaccolo85, A. Zaman21, C. Zampolli6, H. J. C.
Kirsch26, I. Kisel100, S. Kiselev13, A. Kisiel97, J. L. Klay129, C. Klein26, J. Klein6,18, S. Klein34, C. Zanoli58, N. Zardoshti6, A. Zarochentsev27, P. Závada134, N. Zaviyalov77, H. Zbroszczyk97, M.
Klein-Bösing29, M. Kleiner26, A. Kluge6, M. L. Knichel6, A. G. Knospe23, C. Kobdaj130, M. K. Zhalov62, S. Zhang15, X. Zhang19, Z. Zhang19, V. Zherebchevskii27, Y. Zhi132, D. Zhou19, Y. Zhou56,
Köhler30, T. Kollegger14, A. Kondratyev53, N. Kondratyeva61, E. Kondratyuk108, J. Konig26, S. A. Z. Zhou25, J. Zhu14,19, Y. Zhu19, A. Zichichi8,22, G. Zinovjev24 & N. Zurlo73
Konigstorfer43, P. J. Konopka6, G. Kornakov97, L. Koska120, O. Kovalenko102, V. Kovalenko27, M.
Kowalski81, I. Králik121, A. Kravčáková72, L. Kreis14, M. Krivda107,121, F. Krizek2, K. Krizkova
Gajdosova67, M. Krüger26, E. Kryshen62, M. Krzewicki100, A. M. Kubera119, V. Kučera6,59, C. 1
Variable Energy Cyclotron Centre, Homi Bhabha National Institute, Kolkata, India. 2Nuclear
Kuhn60, P. G. Kuijer82, L. Kumar5, S. Kundu98, P. Kurashvili102, A. Kurepin105, A. B. Kurepin105, A. Physics Institute of the Czech Academy of Sciences, Řež, Czech Republic. 3Institut für
Kuryakin77, S. Kushpil2, J. Kvapil107, M. J. Kweon59, J. Y. Kwon59, Y. Kwon124, S. L. La Pointe100, P. Informatik, Fachbereich Informatik und Mathematik, Johann-Wolfgang-Goethe Universität
La Rocca45, Y. S. Lai34, M. Lamanna6, R. Langoy131, K. Lapidus6, A. Lardeux36, P. Larionov66, E. Frankfurt, Frankfurt am Main, Germany. 4Division of Particle Physics, Department of Physics,
Laudi6, R. Lavicka67, T. Lazareva27, R. Lea85, L. Leardini30, J. Lee89, S. Lee124, S. Lehner78, J. Lund University, Lund, Sweden. 5Physics Department, Panjab University, Chandigarh, India.
Lehrbach100, R. C. Lemmon48, I. León Monzón83, E. D. Lesser104, M. Lettrich6, P. Lévai47, X. Li132, 6
European Organization for Nuclear Research (CERN), Geneva, Switzerland. 7Dipartimento
X. L. Li19, J. Lien131, R. Lietava107, B. Lim127, V. Lindenstruth100, A. Lindner28, C. Lippmann14, M. A. DISAT del Politecnico and Sezione INFN, Turin, Italy. 8Centro Fermi – Museo Storico della
Lisa119, A. Liu104, J. Liu50, S. Liu119, W. J. Llope52, I. M. Lofnes25, V. Loginov61, C. Loizides70, P. Fisica e Centro Studi e Ricerche “Enrico Fermi”, Rome, Italy. 9INFN, Sezione di Bologna,
Loncar113, J. A. Lopez30, X. Lopez49, E. López Torres103, J. R. Luhder29, M. Lunardon51, G. Bologna, Italy. 10Department of Physics, Aligarh Muslim University, Aligarh, India. 11Korea

Article
Institute of Science and Technology Information, Daejeon, Republic of Korea. 12Indonesian Mexico. 84High Energy Physics Group, Universidad Autónoma de Puebla, Puebla, Mexico.
Institute of Sciences, Jakarta, Indonesia. 13NRC «Kurchatov» Institute – ITEP, Moscow, Russia. 85
Dipartimento di Fisica dell’Università degli studi di Trieste and Sezione INFN, Trieste, Italy.
14
Research Division and ExtreMe Matter Institute EMMI, GSI Helmholtzzentrum für 86
INFN, Sezione di Cagliari, Cagliari, Italy. 87Institute of Nuclear Physics, Homi Bhabha National
Schwerionenforschung, Darmstadt, Germany. 15Fudan University, Shanghai, China. Institute, Kolkata, India. 88Université de Lyon, Université Lyon 1, CNRS/IN2P3, IPN-Lyon, Lyon,
16
Universidade Estadual de Campinas (UNICAMP), Campinas, Brazil. 17National Research France. 89University of Tsukuba, Tsukuba, Japan. 90University of Cape Town, Cape Town, South
Centre Kurchatov Institute, Moscow, Russia. 18INFN, Sezione di Torino, Turin, Italy. 19Central Africa. 91INFN, Sezione di Bari, Bari, Italy. 92Laboratoire de Physique Subatomique et de
China Normal University, Wuhan, China. 20Instituto de Fsica, Universidad Nacional Autónoma Cosmologie, Université Grenoble-Alpes, CNRS-IN2P3, Grenoble, France. 93INFN, Sezione di
de México, Mexico City, Mexico. 21COMSATS University Islamabad, Islamabad, Pakistan. Trieste, Trieste, Italy. 94Dipartimento di Scienze e Innovazione Tecnologica dell’Università del
22
Dipartimento di Fisica e Astronomia dell’Università and Sezione INFN, Bologna, Italy. Piemonte Orientale and INFN Sezione di Torino, Alessandria, Italy. 95Universidade Federal do
23
University of Houston, Houston, TX, USA. 24Bogolyubov Institute for Theoretical Physics, ABC, Santo Andre, Brazil. 96Instituto de Ciencias Nucleares, Universidad Nacional Autónoma
National Academy of Sciences of Ukraine, Kiev, Ukraine. 25Department of Physics and de México, Mexico City, Mexico. 97Warsaw University of Technology, Warsaw, Poland.
Technology, University of Bergen, Bergen, Norway. 26Institut für Kernphysik, Johann Wolfgang 98
National Institute of Science Education and Research, Homi Bhabha National Institute, Jatni,
Goethe-Universität Frankfurt, Frankfurt am Main, Germany. 27St. Petersburg State University, India. 99Dipartimento di Fisica ‘E.R. Caianiello’ dell’Università and Gruppo Collegato INFN,
St. Petersburg, Russia. 28Horia Hulubei National Institute of Physics and Nuclear Engineering, Salerno, Italy. 100Frankfurt Institute for Advanced Studies, Johann Wolfgang
Bucharest, Romania. 29Westfälische Wilhelms – Universität Münster, Institut für Kernphysik, Goethe-Universität Frankfurt, Frankfurt am Main, Germany. 101Indian Institute of Technology
Münster, Germany. 30Physikalisches Institut, Ruprecht-Karls-Universität Heidelberg, Indore, Indore, India. 102National Centre for Nuclear Research, Warsaw, Poland. 103Centro de
Heidelberg, Germany. 31Creighton University, Omaha, NB, USA. 32Rudjer Bošković Institute, Aplicaciones Tecnológicas y Desarrollo Nuclear (CEADEN), Havana, Cuba. 104Department of
Zagreb, Croatia. 33INFN, Sezione di Padova, Padova, Italy. 34Lawrence Berkeley National Physics, University of California, Berkeley, CA, USA. 105Institute for Nuclear Research,
Laboratory, Berkeley, CA, USA. 35SUBATECH, IMT Atlantique, Université de Nantes, Academy of Sciences, Moscow, Russia. 106Physics Department, Faculty of Science, University
CNRS-IN2P3, Nantes, France. 36Department of Physics, University of Oslo, Oslo, Norway. of Zagreb, Zagreb, Croatia. 107School of Physics and Astronomy, University of Birmingham,
37
Laboratoire de Physique des 2 Infinis, Irène Joliot-Curie, Orsay, France. 38INFN, Sezione di Birmingham, UK. 108NRC Kurchatov Institute IHEP, Protvino, Russia. 109Department of Physics,
Catania, Catania, Italy. 39Gangneung-Wonju National University, Gangneung, Republic of School of Science, National and Kapodistrian University of Athens, Athens, Greece. 110Chicago
Korea. 40Physics Department, University of Jammu, Jammu, India. 41Départment de Physique State University, Chicago, IL, USA. 111National Nuclear Research Center, Baku, Azerbaijan.
Nucléaire (DPhN), Université Paris-Saclay Centre d’Etudes de Saclay (CEA), IRFU, Saclay, 112
Instituto de Física, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, Brazil.
France. 42Helmholtz-Institut für Strahlen- und Kernphysik, Rheinische 113
Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture,
Friedrich-Wilhelms-Universität Bonn, Bonn, Germany. 43Physik Department, Technische University of Split, Split, Croatia. 114A.I. Alikhanyan National Science Laboratory (Yerevan
Universität München, Munich, Germany. 44Bose Institute, Department of Physics and Centre Physics Institute) Foundation, Yerevan, Armenia. 115University of Tokyo, Tokyo, Japan.
for Astroparticle Physics and Space Science (CAPSS), Kolkata, India. 45Dipartimento di Fisica e 116
Nagasaki Institute of Applied Science, Nagasaki, Japan. 117Faculty of Engineering and
Astronomia dell’Università degli Studi di Catania and Sezione INFN, Catania, Italy. Science, Western Norway University of Applied Sciences, Bergen, Norway. 118Centro de
46
Dipartimento di Fisica dell’Università degli studi di Torino and Sezione INFN, Turin, Italy. Investigación y de Estudios Avanzados (CINVESTAV), Mexico City, Mexico. 119Ohio State
47
Wigner Research Centre for Physics, Budapest, Hungary. 48Nuclear Physics Group, STFC University, Columbus, OH, USA. 120Technical University of Košice, Košice, Slovakia. 121Institute
Daresbury Laboratory, Daresbury, UK. 49Université Clermont Auvergne, CNRS/IN2P3, LPC, of Experimental Physics, Slovak Academy of Sciences, Košice, Slovakia. 122KTO Karatay
Clermont-Ferrand, France. 50University of Liverpool, Liverpool, UK. 51Dipartimento di Fisica e University, Konya, Turkey. 123Zentrum für Technologietransfer und Telekommunikation (ZTT),
Astronomia dell’Università degli Studi di Padova and Sezione INFN, Padua, Italy. 52Wayne Hochschule Worms, Worms, Germany. 124Yonsei University, Seoul, Republic of Korea.
State University, Detroit, MI, USA. 53Joint Institute for Nuclear Research (JINR), Dubna, Russia. 125
University of Jyväskylä, Jyväskylä, Finland. 126Jeonbuk National University, Jeonju, Republic
54
Indian Institute of Technology Bombay (IIT), Mumbai, India. 55Sección Fsica, Departamento of Korea. 127Department of Physics, Pusan National University, Pusan, Republic of Korea.
de Ciencias, Pontificia Universidad Católica del Perú, Lima, Peru. 56Niels Bohr Institute, 128
Department of Physics, Sejong University, Seoul, Republic of Korea. 129California
University of Copenhagen, Copenhagen, Denmark. 57Yale University, New Haven, CT, USA. Polytechnic State University, San Luis Obispo, CA, USA. 130Suranaree University of
58
Institute for Subatomic Physics, Utrecht University/Nikhef, Utrecht, Netherlands. 59Inha Technology, Nakhon Ratchasima, Thailand. 131University of South-Eastern Norway, Tonsberg,
University, Incheon, Republic of Korea. 60Université de Strasbourg, CNRS, IPHC UMR 7178, Norway. 132China Institute of Atomic Energy, Beijing, China. 133Dipartimento di Scienze MIFT,
Strasbourg, France. 61NRNU Moscow Engineering Physics Institute, Moscow, Russia. Università di Messina, Messina, Italy. 134Institute of Physics of the Czech Academy of Sciences,
62
Petersburg Nuclear Physics Institute, Gatchina, Russia. 63University of Tennessee, Knoxville, Prague, Czech Republic. 135Università degli Studi di Foggia, Foggia, Italy. 136INFN, Sezione di
TN, USA. 64Institute of Space Science (ISS), Bucharest, Romania. 65Gauhati University, Roma, Rome, Italy. 137Dipartimento di Fisica dell’Università ‘La Sapienza’ and Sezione INFN,
Department of Physics, Guwahati, India. 66INFN, Laboratori Nazionali di Frascati, Frascati, Italy. Rome, Italy. 138Faculty of Mathematics, Physics and Informatics, Comenius University
67
Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague, Bratislava, Bratislava, Slovakia. 139Budker Institute for Nuclear Physics, Novosibirsk, Russia.
Prague, Czech Republic. 68The University of Texas at Austin, Austin, TX, USA. 69Università degli 140
Physics Department, University of Rajasthan, Jaipur, India. 141Helsinki Institute of Physics
Studi di Pavia, Pavia, Italy. 70Oak Ridge National Laboratory, Oak Ridge, TN, USA. (HIP), Helsinki, Finland. 142Institute of Physics, Homi Bhabha National Institute, Bhubaneswar,
71
Dipartimento di Fisica dell’Università degli studi di Bari Aldo Moro and Sezione INFN, India. 143Physikalisches Institut, Eberhard-Karls-Universität Tübingen, Tübingen, Germany.
Cagliari, Italy. 72Faculty of Science, P.J. Šafárik University, Košice, Slovakia. 73Università di 144
Hiroshima University, Hiroshima, Japan. 145Nara Women’s University (NWU), Nara, Japan.
Brescia, Brescia, Italy. 74Universidade de São Paulo (USP), São Paulo, Brazil. 75Dipartimento 146
Centre de Calcul de l’IN2P3, Lyon, France. 147University of Science and Technology of China,
Interateneo di Fisica ‘M. Merlin’, Università degli studi di Bari Aldo Moro and Sezione INFN, Hefei, China. 148Present address: Italian National Agency for New Technologies, Energy and
Bari, Italy. 76Politecnico di Bari, Bari, Italy. 77Russian Federal Nuclear Center (VNIIEF), Sarov, Sustainable Economic Development (ENEA), Bologna, Italy. 149Present address: Dipartimento
Russia. 78Stefan Meyer Institut für Subatomare Physik (SMI), Vienna, Austria. 79iThemba LABS, DET, Politecnico di Torino, Turin, Italy. 150Present address: D.V. Skobeltsyn Institute of Nuclear
National Research Foundation, Somerset West, South Africa. 80University of the Physics, M.V. Lomonosov Moscow State University, Moscow, Russia. 151Present address:
Witwatersrand, Johannesburg, South Africa. 81The Henryk Niewodniczanski Institute of Department of Applied Physics, Aligarh Muslim University, Aligarh, India. 152Present address:
Nuclear Physics, Polish Academy of Sciences, Krakow, Poland. 82Nikhef, National Institute for Institute of Theoretical Physics, University of Wrocław, Wrocław, Poland. 153Deceased: R. D.
Subatomic Physics, Amsterdam, Netherlands. 83Universidad Autónoma de Sinaloa, Culiacán, Majka. ✉e-mail: alice-publications@cern.ch

Methods baryons, its size was studied as a function of the transverse mass of the
2
baryon–baryon pair, mT = (k T + m 2)1/2, where m is the average mass and
Event selection kT = |pT,1 + pT,2|/2 is the transverse momentum of the pair. The source
Events were recorded from inelastic p–p collisions by ALICE50,51 at size decreases with increasing mass, which could reflect the collective
the LHC. A trigger that requires the total signal amplitude measured evolution of the system. The average transverse mass ⟨mT⟩ for the p–Ξ−
in the V0 detector52 to exceed a certain threshold was used to select and p–Ω− pairs differ and are equal to 1.9 GeV/c and 2.2 GeV/c, respec-
high-multiplicity (HM) events. The V0 detector comprises two plas- tively. To determine the source sizes for these values, the measurement
tic scintillator arrays placed on both sides of the interaction point at from p–p correlations (shown in figure 5 of ref. 15) is parameterized as
pseudorapidities 2.8 < η < 5.1 and −3.7 < η < −1.7. The pseudorapidity is rcore = ambT + c, where rcore denotes the width of the Gaussian distribution
defined as η = −ln[tan(θ/2)], where θ is the polar angle of the particle defining the source before taking into account the effect produced by
with respect to the proton beam axis. short lived resonances.
At s = 13 TeV, in the HM events, 30 charged particles in the range |η| In p–p collisions at s = 13 TeV , Ξ− and Ω− baryons are produced
< 0.5 are produced on average. This η range corresponds to the region mostly as primary particles, but about 2/3 of the protons originate from
within 26 degrees of the transverse plane that is perpendicular to the the decay of short-lived resonances with a lifetime of a few fm per c.
beam axis. The HM events are rare, constituting 0.17% of the p–p colli- As a result, the effective source size of both p–Ξ− and p–Ω− is modified.
sions that produce at least one charged particle in the pseudorapidity This effect is taken into account by folding the Gaussian source with
range |η| < 1.0. It was shown38 that HM events contain an enhanced yield an exponential distribution following the method outlined previously15.
of hyperons, which facilitates this analysis. The yield of Ω− in HM events The resulting source distribution can be characterized by an effective
is at least a factor 5 larger, on average, compared with that in total inelas- Gaussian source radius equal to 1.02 ± 0.05 fm for p–Ξ− pairs and to
tic collisions53. A total of 1 × 109 HM events were analysed. Additional 0.95 ± 0.06 fm for p–Ω− pairs. The quoted uncertainties correspond to
details on the HM event selection can be found in a previous work12. variations of the parametrization of the p–p results according to their
systematic and statistical uncertainties.
Particle tracking and identification
For the identification and momentum measurement of charged par- Corrections of the correlation function
ticles, the Inner Tracking System (ITS)54, Time Projection Chamber The correction factor ξ(k*) accounts for the normalization of the k*
(TPC)55, and Time-Of-Flight (TOF)56 detectors of ALICE are used. All distribution of pairs from mixed-events, for effects produced by finite
three detectors are located inside a solenoid magnetic field (0.5 T) momentum resolution and for the influence of residual correlations.
leading to a bending of the trajectories of charged particles. The meas- The mixed-event distribution, Nmixed(k*), has to be scaled down,
urement of the curvature is used to reconstruct the particle momenta. because the number of pairs available from mixed events is much higher
Typical transverse momentum (pT) resolutions for protons, pions and than the number of pairs produced in the same collision used
kaons vary from about 2% for tracks with pT = 10 GeV/c to below 1% for in Nsame(k*). The normalization parameter N is chosen such that the
pT < 1 GeV/c. The particle identity is determined by the energy lost per mean value of the correlation function equals to unity in a region of
unit of track length inside the TPC detector and, in some cases, by the k* values where the effect of final-state interactions are negligible,
particle velocity measured in the TOF detector. Additional experimental 500 < k* < 800 MeV/c.
details are discussed in a previous work51. The finite experimental momentum resolution modifies the meas-
Protons are selected within a transverse momentum range of 0.5 ured correlation functions at most by 8% at low k*. A correction for this
< pT < 4.05 GeV/c. They are identified requiring TPC information for effect is applied. Resolution effects due to the merging of tracks that
candidate tracks with momentum p < 0.75 GeV/c, whereas TPC and TOF are very close to each other were evaluated and found to be negligible.
information are both required for candidates with p > 0.75 GeV/c. An The two measured correlation functions are dominated by the con-
incorrect identification of primary protons occurs in 1% of the cases, tribution of the interaction between p–Ξ− and p–Ω− pairs. Nevertheless,
as evaluated by Monte Carlo simulations. other contributions also influence the measured correlation function.
Direct tracking and identification is not possible for Ξ− and Ω− hyper- They originate either from incorrectly identified particles or from par-
ons and their antiparticles, because they are unstable and decay as a ticles stemming from other weak decays (such as protons from Λ → p +
result of the weak interaction within a few centimetres after their pro- π− decays) combined with primary particles. Because weak decays occur
duction. The mean decay distances (evaluated as c × τ, where τ is the typically some centimetres away from the collision vertex, there is no
+
¯+) → Λ(Λ¯) + K −(K +)
particle lifetime) of Ξ −(Ξ¯ ) → Λ(Λ¯) + π −(π +) and Ω−(Ω final-state interaction between their decay products and the primary
57
are 4.9 and 2.5 cm, respectively . Both decays are followed by a second particles of interest. Hence, the resulting correlation function either
decay of the unstable Λ(Λ¯) hyperon, Λ(Λ¯) → p(p¯) + π −(π +), with an aver- will be completely flat or will carry the residual signature of the interac-
age decay path of 7.9 cm (ref. 57). Consequently, pions (π±), kaons (Κ±) tion between the particles before the decay. A method to determine
+
and protons have to be detected and then combined to search for Ξ −(Ξ¯ ) the exact shape and relative yields of the residual correlations has been
− ¯+
and Ω (Ω ) candidates. Those secondary particles are identified by previously developed8,59, and it is used in this analysis. Such contribu-
+
the TPC information in the case of the reconstruction of Ξ −(Ξ¯ ), and in tions are subtracted from the measured p–Ξ− and p–Ω− correlations
− ¯+
the case of Ω (Ω ) it is additionally required that the secondary protons to obtain the genuine correlation functions. The residual correlation
+
and kaons are identified in the TOF detector. To measure the Ξ −(Ξ¯ ) stemming from misidentification is evaluated experimentally11 and its
− ¯+
and Ω (Ω ) hyperons, the two successive weak decays need to be recon- contribution is also subtracted from the measured correlation function.
structed. The reconstruction procedure is very similar for both hyper- The systematic uncertainties associated with the genuine correlation
ons and is described in detail previously58. Topological selections function arise from the following sources: (i) the selection of the pro-
+
are performed to reduce the combinatorial background, evaluated ton, Ξ −(Ξ¯ ) and Ω−(Ω¯+), (ii) the normalization of the mixed-event dis-
via a fit to the invariant mass distribution. tributions, (iii) uncertainties on the residual contributions, and (iv)
uncertainties due to the finite momentum resolution. To evaluate the
Determination of the source size associated systematic uncertainties: (i) all single-particle and topo-
The widths of the Gaussian distributions constituting S(r*), and defin- logical selection criteria are varied with respect to their default values
ing the source size, are calculated on the basis of the results of the and the analysis is repeated for 50 different random combinations of
analysis of the p–p correlation function in p–p collisions at s = 13 TeV such selection criteria so that the maximum change introduced in the
by the ALICE collaboration15. Assuming a common source for all number of p–Ξ− and p–Ω− pairs is 25% and the changes in the purity of
Article
+ +
protons, Ξ −(Ξ¯ ) and Ω−(Ω¯ ) are kept below 3%; (ii) the k*-normalization Acknowledgements We are grateful to T. Hatsuda and K. Sasaki from the HAL QCD
Collaboration for their valuable suggestions and for providing the lattice QCD results
range of the mixed-events is varied, and a linear function of k* is also regarding the p–Ξ− and p–Ω− interactions. We are also grateful to A. Ohnishi, T. Hyodo, T. Iritani,
used for an alternative normalization which results in an asymmetric Y. Kamiya and T. Sekihara for their suggestions and discussions. We thank all the engineers and
uncertainty; (iii) the shape of the residual correlations and its relative technicians of the LHC for their contributions to the construction of the experiment and the
CERN accelerator teams for the performance of the LHC complex. We acknowledge the
contribution are altered; and (iv) the momentum resolution and the resources and support provided by all Grid centres and the Worldwide LHC Computing Grid
used correction method are changed. The total systematic uncertain- (WLCG) collaboration. We acknowledge the following funding agencies for their support in
ties associated with the genuine correlation function are maximal at building and running the ALICE detector: A. I. Alikhanyan National Science Laboratory
(Yerevan Physics Institute) Foundation (ANSL), State Committee of Science and World
low k*, reaching a value of 9% and 8% for p–Ξ− and p–Ω−, respectively. Federation of Scientists (WFS), Armenia; Austrian Academy of Sciences, Austrian Science Fund
(FWF): [M 2467-N36] and Nationalstiftung für Forschung, Technologie und Entwicklung,
HAL QCD potentials Austria; Ministry of Communications and High Technologies, National Nuclear Research
Center, Azerbaijan; Conselho Nacional de Desenvolvimento Cientfico e Tecnológico (CNPq),
Results from calculations by the HAL QCD Collaboration for the p–Ξ−14 Financiadora de Estudos e Projetos (Finep), Fundação de Amparo à Pesquisa do Estado de São
and p–Ω−13 interactions are shown in Figs. 3, 4. Such interactions were Paulo (FAPESP) and Universidade Federal do Rio Grande do Sul (UFRGS), Brazil; Ministry of
studied via (2 + 1)-flavor lattice QCD simulations with nearly physical Education of China (MOEC), Ministry of Science and Technology of China (MSTC) and National
Natural Science Foundation of China (NSFC), China; Ministry of Science and Education and
quark masses (mπ = 146 MeV/c2). Croatian Science Foundation, Croatia; Centro de Aplicaciones Tecnológicas y Desarrollo
In Fig. 4, the p–Ξ− and p–Ω− potentials are shown for calculations Nuclear (CEADEN), Cubaenerga, Cuba; Ministry of Education, Youth and Sports of the Czech
with t/a = 12, with t the Euclidean time and a the lattice spacing of the Republic, Czech Republic; The Danish Council for Independent Research | Natural Sciences,
the VILLUM FONDEN and Danish National Research Foundation (DNRF), Denmark; Helsinki
calculations. The HAL QCD Collaboration provided 23 and 20 sets of Institute of Physics (HIP), Finland; Commissariat à l’Energie Atomique (CEA) and Institut
parameters for the description of the shape of the p–Ξ− and p–Ω− poten- National de Physique Nucléaire et de Physique des Particules (IN2P3) and Centre National de la
tials, respectively. Such parametrizations result from applying the Recherche Scientifique (CNRS), France; Bundesministerium für Bildung und Forschung (BMBF)
and GSI Helmholtzzentrum für Schwerionenforschung GmbH, Germany; General Secretariat
jackknife method, which takes into account the statistical uncertainty for Research and Technology, Ministry of Education, Research and Religions, Greece; National
of the calculations. The width of the curves in Fig. 4 corresponds to Research, Development and Innovation Office, Hungary; Department of Atomic Energy
the maximum variations observed in the potential shape by using the Government of India (DAE), Department of Science and Technology, Government of India
(DST), University Grants Commission, Government of India (UGC) and Council of Scientific and
different sets of parameters. Industrial Research (CSIR), India; Indonesian Institute of Science, Indonesia; Centro Fermi –
To obtain the correlation functions shown in Fig. 3 we consider the Museo Storico della Fisica e Centro Studi e Ricerche Enrico Fermi and Istituto Nazionale di
Fisica Nucleare (INFN), Italy; Institute for Innovative Science and Technology, Nagasaki
calculations with t/a = 12, both for p–Ξ− and p–Ω−. The statistical uncer-
Institute of Applied Science (IIST), Japanese Ministry of Education, Culture, Sports, Science
tainty of the calculations is evaluated using the jackknife variations, and Technology (MEXT) and Japan Society for the Promotion of Science (JSPS) KAKENHI,
and a systematic uncertainty is added in quadrature evaluated by con- Japan; Consejo Nacional de Ciencia (CONACYT) y Tecnología, through Fondo de Cooperación
Internacional en Ciencia y Tecnología (FONCICYT) and Dirección General de Asuntos del
sidering calculations with t/a = 11 and t/a = 13.
Personal Academico (DGAPA), Mexico; Nederlandse Organisatie voor Wetenschappelijk
Onderzoek (NWO), Netherlands; The Research Council of Norway, Norway; Commission on
Science and Technology for Sustainable Development in the South (COMSATS), Pakistan;
Data availability Pontificia Universidad Católica del Perú, Peru; Ministry of Science and Higher Education,
National Science Centre and WUT ID-UB, Poland; Korea Institute of Science and Technology
All data shown in histograms and plots are publicly available on the Information and National Research Foundation of Korea (NRF), Republic of Korea; Ministry of
HEPdata repository (https://hepdata.net). Education and Scientific Research, Institute of Atomic Physics and Ministry of Research and
Innovation and Institute of Atomic Physics, Romania; Joint Institute for Nuclear Research
(JINR), Ministry of Education and Science of the Russian Federation, National Research Centre
Kurchatov Institute, Russian Science Foundation and Russian Foundation for Basic Research,
Code availability Russia; Ministry of Education, Science, Research and Sport of the Slovak Republic, Slovakia;
National Research Foundation of South Africa, South Africa; Swedish Research Council (VR)
The source code used in this study is publicly available under the names
and Knut & Alice Wallenberg Foundation (KAW), Sweden; European Organization for Nuclear
AliPhysics (https://github.com/alisw/AliPhysics) and AliRoot (https:// Research, Switzerland; Suranaree University of Technology (SUT), National Science and
github.com/alisw/AliROOT). Further information can be provided by Technology Development Agency (NSDTA) and Office of the Higher Education Commission
under NRU project of Thailand, Thailand; Turkish Atomic Energy Agency (TAEK), Turkey;
the authors upon reasonable request. National Academy of Sciences of Ukraine, Ukraine; Science and Technology Facilities Council
(STFC), United Kingdom; National Science Foundation of the United States of America (NSF)
50. ALICE Collaboration. The ALICE experiment at the CERN LHC. JINST 3, S08002 (2008). and United States Department of Energy, Office of Nuclear Physics (DOE NP), United States of
51. ALICE Collaboration. Performance of the ALICE experiment at the CERN LHC. Int. J. Mod. America.
Phys. A 29, 1430044 (2014).
52. ALICE Collaboration. Performance of the ALICE VZERO system. JINST 8, P10016 (2013).
53. ALICE Collaboration. Multiplicity dependence of (multi-)strange hadron production in Author contributions All authors have contributed to the publication, being variously involved
proton-proton collisions at √s = 13 TeV. Eur. Phys. J. C 80, 167 (2020). in the design and the construction of the detectors, in writing software, calibrating subsystems,
54. ALICE Collaboration. Alignment of the ALICE Inner Tracking System with cosmic-ray operating the detectors and acquiring data, and finally analysing the processed data. The ALICE
tracks. JINST 5, P03003 (2010). Collaboration members discussed and approved the scientific results. The manuscript was
55. Alme, J. et al. The ALICE TPC, a large 3-dimensional tracking device with fast readout for prepared by a subgroup of authors appointed by the collaboration and subject to an internal
ultra-high multiplicity events. Nucl. Instrum. Meth. A 622, 316–367 (2010). collaboration-wide review process. All authors reviewed and approved the final version of the
56. Akindinov, A. et al. Performance of the ALICE Time-Of-Flight detector at the LHC. Eur. manuscript.
Phys. J. Plus 128, 44 (2013).
57. Particle Data Group Collaboration. Review of particle physics. Phys. Rev. D 98, 030001 Competing interests The authors declare no competing interests.
(2018).
58. ALICE Collaboration. Strange particle production in proton–proton collisions at √s = 0.9 Additional information
TeV with ALICE at the LHC. Eur. Phys. J. C 71, 1594 (2011). Correspondence and requests for materials should be addressed to L.M.
59. Kisiel, A., Zbroszczyk, H. & Szymański, M. Extracting baryon–antibaryon Peer review information Nature thanks Kazuya Aoki, Andrzej Kupsc, Manuel Lorenz and the
strong-interaction potentials from pΛ femtoscopic correlation functions. Phys. Rev. C 89, other, anonymous, reviewer(s) for their contribution to the peer review of this work.
054916 (2014). Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Dipolar evaporation of reactive molecules to

below the Fermi temperature
https://doi.org/10.1038/s41586-020-2980-7 Giacomo Valtolina1,2 ✉, Kyle Matsuda1,2, William G. Tobias1,2, Jun-Ru Li1,2,

Luigi De Marco1,2 & Jun Ye1,2 ✉
Accepted: 29 September 2020

The control of molecules is key to the investigation of quantum phases, in which rich
degrees of freedom can be used to encode information and strong interactions can be
Check for updates
precisely tuned1. Inelastic losses in molecular collisions2–5, however, have greatly
hampered the engineering of low-entropy molecular systems6. So far, the only
quantum degenerate gas of molecules has been created via association of two highly
degenerate atomic gases7,8. Here we use an external electric field along with optical
lattice confinement to create a two-dimensional Fermi gas of spin-polarized
potassium–rubidium (KRb) polar molecules, in which elastic, tunable dipolar
interactions dominate over all inelastic processes. Direct thermalization among the
molecules in the trap leads to efficient dipolar evaporative cooling, yielding a rapid
increase in phase-space density. At the onset of quantum degeneracy, we observe the
effects of Fermi statistics on the thermodynamics of the molecular gas. These results
demonstrate a general strategy for achieving quantum degeneracy in dipolar
molecular gases in which strong, long-range and anisotropic dipolar interactions can
drive the emergence of exotic many-body phases, such as interlayer pairing and
p-wave superfluidity.
The complex internal structure of molecules can be both useful and a dipole–dipole interactions within each 2D site, while preventing the
hindrance: it represents a key resource for the development of tunable attractive head-to-tail interactions that facilitate losses at short range.
and programmable quantum devices1,9,10, but it is also responsible for Our recent advances in the production of degenerate Fermi gases of
strong inelastic losses during collisions11–14. Despite recent advances in polar molecules7,8, combined with precise electric-field control using
molecular quantum science15–24, full control of elastic collisions between in-vacuum electrodes32 (Fig. 1), allow us to perform a systematic char-
molecules has not been achieved, making it very difficult to create the acterization of the properties of a 2D Fermi gas of polar molecules.
low-entropy bulk molecular gases that are required for the exploration
of rich many-body physics and emergent quantum phenomena1,25.
Here, we report the realization of highly tunable elastic interactions A long-lived 2D Fermi gas of polar molecules
in a quantum gas of polar molecules through the application of an The KRb 2D Fermi gas is created from an ultracold atomic mixture of
external electric field along a stack of two-dimensional (2D) layers fermionic 40K and bosonic 87Rb atoms. The atomic mixture is initially
generated with a one-dimensional optical lattice. The induced electric held in a crossed optical dipole trap (ODT) and then transferred into a
dipole moment in the laboratory frame gives rise to repulsive dipolar single layer of a large-spacing lattice (LSL) with an 8-μm spatial period,
interactions that stabilize the molecular gas against reactive collisions which increases the mixture’s confinement along the vertical direction
and formation of collisional complexes. These long-range interactions (y). The mixture is then transferred into a vertical lattice (VL) with spac-
provide a large elastic collision cross-section for identical ultracold ing of 540 nm that confines it to a quasi-2D geometry. The intermediate
fermionic molecules, in contrast to contact interactions26. We dem- LSL transfer results in the Rb cloud populating a controllable number
onstrate the enhancement of dipolar interactions by several orders of VL layers τ ranging between 5 and 15. We directly probe the number
of magnitude and achieve a ratio of elastic-to-inelastic collisions that of occupied 2D layers via a matter-wave focusing technique on the
exceeds 100. This favourable interaction regime enables direct molecu- Rb cloud (Fig. 1c)33,34. The measured τ is in excellent agreement with
lar thermalization and efficient evaporative cooling, allowing us to theoretical modelling of the in situ cloud size (see Methods).
bring the molecular temperature T below the Fermi temperature TF. Magneto-optical association is used to pair roughly half of the initial
The onset of quantum degeneracy is signalled by deviations from the Rb atoms into ground-state KRb molecules35. This process is fast and
classical expansion energy as the ratio T/TF is reduced below unity7,27. coherent, and the resulting molecular cloud populates the same lay-
Our strategy follows previous theory proposals28–30 and our earlier ers originally occupied by the Rb cloud. The leftover K and Rb atoms
experimental study on molecular reactions in quasi-two dimensions31. are selectively and quickly removed from the trap. In the VL, the trap
This geometry allows us to take advantage of the anisotropic charac- frequencies are set to (ωx, ωy, ωz) = 2π × (40, 17,000, 40) Hz. The quoted
ter of the dipolar potential and retain only the repulsive side-to-side trapping frequencies are for KRb throughout the paper unless otherwise
JILA, National Institute of Standards and Technology, Boulder, CO, USA. 2Department of Physics, University of Colorado, Boulder, CO, USA. ✉e-mail: giva4289@colorado.edu; ye@jila.colorado.edu
1

Article
a
a 6
n (×107 cm–2)
4
0
0 2 4 6 8 10
Time (s)
g
y b EDC (kV cm–1)
x
0.0 2.1 4.7 9.1
z 12
b c 9
E (×10–8 cm2 s–1)

+V
+JV 6
EDC
3
–JV
y –V 0
0.0 0.1 0.2 0.3
x
z Dipole moment (D)
Fig. 1 | Experimental setup. a, The 2D molecular cloud is trapped at the centre c 300 4.5
of the electrode assembly (grey). 2D optical trapping is achieved with the VL 4.0
(green), which is loaded using the ODT (orange) and LSL (red). Absorption
3.5
images of molecules are collected through the same lens as that used to
Heating rate (μK s–1)

200 3.0
E (×10–8 cm2 s–1)
focus the LSL. b, Sketch of the experiment as seen down the z axis. The bias
electric field is generated along y, perpendicular to the 2D layers of the VL.
0.5
c, Matter-wave focusing data of the Rb layers in the VL, which have a spacing of
540 nm. 100
0 0.0
stated. We create a 2D gas with N ≈ 20,000 trapped molecules, a typical
temperature T ≈ 250 nK, and T/TF ranging from 1.5 to 3 depending on τ. 7 10 13 16
The 2D molecular cloud is at the centre of an in-vacuum six-electrode Zy /(2π) (kHz)
assembly composed of two indium tin oxide (ITO)-coated glass plates
Fig. 2 | Long-lived polar molecules in 2D. a, Time evolution of the molecular
and four tungsten rods (Fig. 1a). With this, we generate a highly tunable
density n at d = 0.2 D. b, Inelastic loss rate β as a function of dipole moment. All
bias electric field EDC that induces strong dipolar interactions between error bars are 1 standard error of the mean (s.e.), determined from two-body
molecules (Fig. 1b). The ratio γ of the voltage of the rods to the voltage decay fits (equation (1)) The top x axis shows the bias electric field EDC at the
of the plates can be used to cancel the curvature introduced by the corresponding dipole moment . c, Both β (grey circles) and the heating rate
parallel plate edges (flat-field configuration) or to introduce additional (orange squares) saturate at their minimum values near ω y = 2π × 7 kHz (vertical
curvatures and gradients for molecule manipulation. grey bar indicates uncertainty in the molecule temperature), consistent with
The chemically reactive KRb molecules suffer from inelastic the mechanism of quasi-2D dipolar scattering. Heating rate error bars are 1 s.e.,
two-body losses2,12, which result in the average molecular density n determined from linear fits.
decaying over time t according to a two-body rate equation of the form:
dn ∂n dT for several seconds (Fig. 2a). The evolution of β as a function of EDC is

= − βn 2 + , (1)
dt ∂T dt shown in Fig. 2b. Close to EDC = 4.7 kV cm−1, β reaches a minimum of nearly
five times below the zero-field value. The increase of β for d > 0.2 D is
where β is the two-body loss rate coefficient and the second term on consistent with a quasi-2D picture of dipolar scattering37,38.
the right side of equation (1) accounts for temperature changes affect- To understand the effect of optical confinement, we perform a thor-
ing the density3. ough characterization of the 3D-to-2D crossover by measuring β versus
In a three-dimensional (3D) harmonic trap, β increases sharply with EDC, ωy at EDC = 5 kV cm−1 (Fig. 2c). Here, β drops abruptly as the lattice vertical
so that inelastic interactions dominate elastic ones3,36. However, strong confinement is increased and reaches a plateau near ωy = 2π × 7 kHz,
confinement along the direction of EDC suppresses this detrimental loss corresponding to the quasi-2D limit where kBT ≱ ħωy, with kB being the
increase31 by preventing head-to-tail collisions along EDC. Even though Boltzmann constant and ħ the reduced Planck constant, and the mol-
n is large in the occupied layers, the molecular gas shows a remarkable ecules principally occupy the lowest band of the VL. In contrast to the
stability with repulsive interactions turned on. With an induced dipole 3D case, where the heating rate exceeds 3 µK s−1, in quasi-two dimen-
moment d = 0.2 D in the flat-field configuration, KRb molecules survive sions we do not record a substantial increase in temperature along the

a b We diabatically change the power in one of the ODT beams to suddenly
0.6
increase the energy along z. Elastic collisions between the molecules
100
then redistribute the excess energy from z onto x. The rate Γth of the
0.5 temperature equilibration between the two axes is proportional to the
Γth (s–1)
Tx (μK)
dipolar elastic collision rate28,39,40. We extract Γth by fitting the increase

10 of T along x with an exponential curve.
0.4
With the loss suppressed in quasi-2D, we expect a substantial increase
of the thermalization rate Γth with a d4 dependence28,30. Comparing the
1
0.3 thermalization dynamics observed for d = 0.1 D and 0.21 D (Fig. 3a), we
0 200 400 600 800 1,000 0.03 0.1 0.3
Time (ms) Dipole moment (D) see Γth increase by a factor of 10. Over our investigated range of EDC, Γth
changes by two orders of magnitude (Fig. 3b), showing the extreme
Fig. 3 | Tuning strong dipolar elastic interactions in a 2D molecular gas.
tunability of elastic dipolar interactions in our system. We observe the
a, Cross-dimensional thermalization dynamics at d = 0.1 D (orange diamonds)
and d = 0.21 D (grey squares). Error bars are 1 s.e. of 5 independent cross-dimensional thermalization dynamics at lower dipolar strength
measurements. b, The trend of Γth extracted from cross-dimensional (d < 0.1 D) being dominated by cross-dimensional relaxation owing to
thermalization dynamics as a function of d. The solid line is a power-law fit trap anharmonicity, which limits the smallest Γth that we can measure.
for d ≥ 0.1 D that yields a power of 3.3(1.0). The filled grey circle corresponds For d ≥ 0.1 D, a fit to a power-law dependence of Γth on d yields a power of
to the measurement at d = 0.0 D, and it is artificially placed at d = 0.03 D for 3.3 ± 1.0, in good agreement with theoretical expectations28,30. For the
figure clarity. The dashed horizontal line at Γth = 2 s−1 is the background highest values of d we explored, the rate Γth is comparable to the radial
cross-dimensional relaxation from trap anharmonicity. All error bars are 1 s.e., trapping frequency, opening the way for future studies of collective
determined from exponential fits. dynamics in molecular gases41.
An estimate of the ratio of elastic-to-inelastic collisions is obtained by
comparing Γth to the initial rate of inelastic losses Γin, which is expressed
radial direction. The suppression of heating is due to cancellation of as Γin = βn0, with n0 the initial average density of the 2D gas. From the
anti-evaporation in quasi-two dimensions30 and represents another data in Fig. 2b, at d = 0.2 D, we estimate a rate Γin = 0.83(5) s−1, whereas
advantage of this configuration. Γth = 21(6) s−1 for the same dipole strength. In the temperature regime of
the cross-dimensional thermalization experiments, theoretical mod-
els30 predict that α = 8 elastic collisions are needed for each molecule to
Cross-dimensional thermalization reach thermal equilibrium. This indicates a ratio of elastic-to-inelastic
To characterize elastic interactions in our thermal molecular cloud, collisions α(Γth/Γin) = 200 ± 60, demonstrating that elastic processes
we perform cross-dimensional thermalization at various values of EDC. dominate in this system.
a b c
10
Trap depth (μK)
4
0 0
3
N (×103)
–2
Sevap
–2
2
–200 0 200 –200 0 200
Position (μm) Position (μm)
Optical trap 1
Electric field 1
Combined 0.05 0.10 0.15 0.20 0.0 0.1 0.2 0.3
T (μK) Dipole moment (D)
d 0.3
e f
4 0.4
Integrated OD
3 0.3
0.2
δU/U
2 0.2
y 1 0.1
0.1
0 0.0
x
0.0 –100 –50 0 50 100 0.0 0.5 1.0 1.5 2.0
OD Position (μm) T/TF
Fig. 4 | Evaporative cooling to the quantum degenerate regime. a, Cuts (bottom). e, Optical density (OD, dimensionless) profiles (orange circles for
along the x axis of the combined electro-optical potential for the flat-field T/TF = 2.0(1) and grey diamonds for T/TF = 0.81(15)) of the images in d
configuration (left panel) and at the end of the evaporation (right panel). (integrated along y), together with the Fermi–Dirac fit to the whole cloud (grey
b, Evolution of N and T (orange squares) at different stages of the evaporation line) and the Gaussian fit to the outer wings (orange line). f, Measurement of
trajectory at EDC = 6.5 kV cm−1 and d = 0.25 D. The power-law fit (orange line) δU/U at different values of T/TF from the Fermi–Dirac fit to the entire cloud
yields Sevap = 1.06(15). The dashed grey line is for a constant T/TF, corresponding (grey circles) and from the Gaussian fit to the outer wings of the cloud (orange
to Sevap = 2.0. Error bars are 1 s.e. of four independent measurements. c, Summary squares). The solid and dashed curves show δU/U for the 2D and 3D ideal Fermi
of Sevap versus d. All error bars are 1 s.e., determined from power-law fits. gases, respectively. All error bars are 1 s.e., determined from Gaussian or
d, Average of 20 band-mapped absorption images of the molecular cloud in the polylogarithmic fits.
x–y plane after 5.84 ms of time of flight for T/TF = 2.0(1) (top) and T/TF = 0.81(15)

Article
the 2D case, in contrast to 0.57 for the 3D case. Correspondingly, the
Electric-field-controlled evaporative cooling 2D Fermi gas shows a larger δU with respect to the 3D case at the same
The large elastic-to-inelastic collision ratio is an excellent setting for T/TF (ref. 42). As we reach T < TF, in excellent agreement with theoreti-
evaporatively cooling to enhance the phase-space density of our molec- cal expectations, we observe a large increase of δU (Fig. 4f). This is a
ular cloud. For non-degenerate 2D fermionic gases, phase-space density hallmark for the onset of quantum degeneracy in trapped Fermi gases27.
is inversely proportional to (T/TF)2, and phase-space density increases
only if the change of N versus T during evaporation fulfills the criterion:
Conclusions
∂logN We have realized a 2D Fermi gas of reactive polar molecules where
Sevap = < 2. (2)
∂logT precisely tunable elastic dipolar interactions dominate all inelastic
When Sevap = 2, the gas maintains a constant T/TF. processes. This allowed us to perform evaporative cooling of mol-
The efficiency of evaporative cooling relies on our ability to selectively ecules to the onset of Fermi degeneracy. We demonstrated a general
remove the hottest molecules from the trap and to let the remaining mol- and robust scheme for ultracold gases of polar molecules to reach
ecules re-thermalize to a lower temperature. Reducing the trap depth quantum degeneracy. For example, using a strong 2D confinement and
by lowering the optical trap power for evaporation, as is routinely done large dipolar interactions, this method should enable Bose–Einstein
in ultracold atom experiments, is not a viable solution here because we condensation in bosonic molecular gases. It has long been anticipated
cannot lower the tight 2D confinement without affecting the stability of that quantum gases of polar molecules in two dimensions would allow
the cloud (Fig. 2c). Instead, by increasing γ with respect to the flat-field access to strongly correlated many-body phases43–52. Our results set
configuration, we introduce a tunable anti-trapping electric field along the stage for exploration of these exotic regimes.
the x direction to reduce the radial confinement experienced by the
molecules (Fig. 4a). By measuring the change of ωx as a function of γ,
Online content
we can directly reconstruct the combined electro-optical potential
and benchmark its theoretical modelling (see Extended Data Fig. 2). Any methods, additional references, Nature Research reporting sum-
For the evaporation measurement, we start with a 2D gas with layer maries, source data, extended data, supplementary information,
number τ = 5 ± 1, ωy = 2π × 17 kHz, and an average T/TF = 1.5(1). After acknowledgements, peer review information; details of author con-
creating the molecules (see Methods), we ramp EDC to a target field tributions and competing interests; and statements of data and code
while keeping γ at the flat-field value. We trigger the evaporation by availability are available at https://doi.org/10.1038/s41586-020-2980-7.
increasing γ to reduce the trap depth. We do not observe any evapo-
ration until the truncation parameter η, defined as the ratio of trap 1. Bohn, J. L., Rey, A. M. & Ye, J. Cold molecules: progress in quantum engineering of
chemistry and quantum matter. Science 357, 1002–1010 (2017).
depth over thermal energy kBT, reaches a value of 4 (see Methods), in
2. Ospelkaus, S. et al. Quantum-state controlled chemical reactions of ultracold
good agreement with theoretical expectations30. We further increase potassium-rubidium molecules. Science 327, 853–857 (2010).
γ over a timescale of hundreds of milliseconds, which is long enough 3. Ni, K.-K. et al. Dipolar collisions of polar molecules in the quantum regime. Nature 464,
1324–1328 (2010).
for the molecules to efficiently re-thermalize at a lower T as the trap
4. Guo, M. et al. Dipolar collisions of ultracold ground-state bosonic molecules. Phys. Rev. X
depth is reduced. At the end of the evaporation ramp, we return to 8, 041044 (2018).
the flat-field configuration and ramp EDC back to its initial value. We 5. Gregory, P. D. et al. Sticky collisions of ultracold RbCs molecules. Nat. Commun. 10, 3104
(2019).
coherently convert the ground state molecules back to the Feshbach 6. Moses, S. A. et al. Creation of a low-entropy quantum gas of polar molecules in an optical
state and image the cloud of Feshbach molecules after band-mapping lattice. Science 350, 659–662 (2015).
from the VL. Detailed time sequences for the evolution of EDC, γ and trap 7. De Marco, L. et al. A degenerate Fermi gas of polar molecules. Science 363, 853–856 (2019).
8. Tobias, W. G. et al. Thermalization and sub-Poissonian density fluctuations in a
depth are shown in the Methods. degenerate molecular Fermi gas. Phys. Rev. Lett. 124, 033401 (2020).
At EDC = 6.5 kV cm−1, the evolution of N and T at different stages of the 9. André, A. et al. A coherent all-electrical interface between polar molecules and
optimized evaporation sequence is shown in Fig. 4b. To characterize mesoscopic superconducting resonators. Nat. Phys. 2, 636–642 (2006).
10. DeMille, D., Doyle, J. M. & Sushkov, A. O. Probing the frontiers of particle physics with
the evaporation efficiency, we fit the N versus T dependence with a tabletop-scale experiments. Science 357, 990–994 (2017).
power-law function to extract Sevap. For the data shown in Fig. 4b, we 11. Gregory, P. D., Blackmore, J. A., Bromley, S. L. & Cornish, S. L. Loss of ultracold 87Rb133Cs
obtain Sevap = 1.06(15), far below the threshold of 2 required to increase molecules via optical excitation of long-lived two-body collision complexes. Phys. Rev.
Lett. 124, 163402 (2020).
phase-space density. The trend of Sevap versus d is plotted in Fig. 4c 12. Hu, M. G. et al. Direct observation of bimolecular reactions of ultracold KRb molecules.
and reaches a minimum (that is, maximum increase in phase-space Science 366, 1111–1115 (2019).
density) at d = 0.25 D, where the ratio of elastic-to-inelastic collisions 13. Kirste, M. et al. Quantum-state resolved bimolecular collisions of velocity-controlled OH
with NO radicals. Science 338, 1060–1063 (2012).
is the largest37,38. 14. Christianen, A., Zwierlein, M. W., Groenenboom, G. C. & Karman, T. Photoinduced
When we cool molecules to T < TF (Fig. 4d), we witness the onset of two-body loss of ultracold molecules. Phys. Rev. Lett. 123, 123402 (2019).
Fermi degeneracy, which is signalled by deviations from classical ther- 15. Danzl, J. G. et al. Quantum gas of deeply bound ground state molecules. Science 321,
1062–1066 (2008).
modynamics owing to the increasing role of the Pauli exclusion prin- 16. Seeßelberg, F. et al. Extending rotational coherence of interacting polar molecules in a
1/2
ciple. Here,TF =
ħω R
kB ( )
2N
τ
, with ωR = ωx ωz the geometric mean of the
17.
spin-decoupled magic trap. Phys. Rev. Lett. 121, 253401 (2018).
Will, S. A., Park, J. W., Yan, Z. Z., Loh, H. & Zwierlein, M. W. Coherent microwave control of
radial trapping frequency. Our best result produced a 2D molecular ultracold 23Na40K molecules. Phys. Rev. Lett. 116, 225306 (2016).
18. Barry, J. F., McCarron, D. J., Norrgard, E. B., Steinecker, M. H. & Demille, D. Magneto-optical
Fermi gas with N = 1.7(1) × 103 and T/TF = 0.6(2). trapping of a diatomic molecule. Nature 512, 286–289 (2014).
We extract T by using either a fit to the Fermi–Dirac distribution on 19. Truppe, S. et al. Molecules cooled below the Doppler limit. Nat. Phys. 13, 1173–1176 (2017).
the entire expanded cloud or a Gaussian fit restricted to the cloud’s 20. Anderegg, L. et al. An optical tweezer array of ultracold molecules. Science 365,
1156–1158 (2019).
outer wings (see Methods). As shown in Fig. 4e, for T/TF = 0.81(15) the 21. Ding, S., Wu, Y., Finneran, I. A., Burau, J. J. & Ye, J. Sub-Doppler cooling and compressed
Gaussian fit to the outer wings of the time of flight density profile trapping of YO molecules at μK temperatures. Phys. Rev. X 10, 021049 (2020).
overestimates the density at the centre. We quantify this through the 22. Yang, H. et al. Observation of magnetically tunable Feshbach resonances in ultracold
23
Na40K + 40K collisions. Science 363, 261–264 (2019).
increasing difference δU = U − Ucl between the energy U of the fermionic 23. Son, H., Park, J. J., Ketterle, W. & Jamison, A. O. Collisional cooling of ultracold molecules.
gas and the energy Ucl ∝ kBT, as T/TF decreases7,27. U is determined from Nature 580, 197–200 (2020).
a Gaussian fit to the whole cloud (see Methods) and Ucl is calculated 24. Segev, Y. et al. Collisions between cold molecules in a superconducting magnetic trap.
Nature 572, 189–193 (2019).
based on the measured T. Owing to the different density of states in the 25. Baranov, M. A., Dalmonte, M., Pupillo, G. & Zoller, P. Condensed matter theory of dipolar
harmonic trap, the chemical potential crosses zero for T/TF = 0.78 for quantum gases. Chem. Rev. 112, 5012–5061 (2012).

26. Aikawa, K. et al. Reaching Fermi degeneracy via universal dipolar scattering. Phys. Rev. 41. Babadi, M. & Demler, E. Collective excitations of quasi-two-dimensional trapped dipolar
Lett. 112, 010404 (2014). fermions: transition from collisionless to hydrodynamic regime. Phys. Rev. A 86, 063638
27. DeMarco, B. & Jin, D. S. Onset of Fermi degeneracy in a trapped atomic gas. Science 285, (2012).
1703–1706 (1999). 42. Giorgini, S., Pitaevskii, L. P. & Stringari, S. Theory of ultracold atomic Fermi gases. Rev.
28. Quéméner, G. & Bohn, J. L. Dynamics of ultracold molecules in confined geometry and Mod. Phys. 80, 1215 (2008).
electric field. Phys. Rev. A 83, 012705 (2011). 43. Büchler, H. P. et al. Strongly correlated 2D quantum phases with cold polar molecules:
29. Micheli, A. et al. Universal rates for reactive ultracold polar molecules in reduced controlling the shape of the interaction potential. Phys. Rev. Lett. 98, 060404 (2007).
dimensions. Phys. Rev. Lett. 105, 073202 (2010). 44. Góral, K., Santos, L. & Lewenstein, M. Quantum phases of dipolar bosons in optical
30. Zhu, B., Quéméner, G., Rey, A. M. & Holland, M. J. Evaporative cooling of reactive lattices. Phys. Rev. Lett. 88, 170406 (2002).
polar molecules confined in a two-dimensional geometry. Phys. Rev. A 88, 063405 45. Capogrosso-Sansone, B., Trefzger, C., Lewenstein, M., Zoller, P. & Pupillo, G. Quantum
(2013). phases of cold polar molecules in 2D optical lattices. Phys. Rev. Lett. 104, 125301 (2010).
31. de Miranda, M. H. G. et al. Controlling the quantum stereodynamics of ultracold 46. Cooper, N. R. & Shlyapnikov, G. V. Stable topological superfluid phase of ultracold polar
bimolecular reactions. Nat. Phys. 7, 502–507 (2011). fermionic molecules. Phys. Rev. Lett. 103, 155302 (2009).
32. Covey, J. P. Enhanced Optical and Electric Manipulation of a Quantum Gas of KRb 47. Gorshkov, A. V. et al. Tunable superfluidity and quantum magnetism with ultracold polar
Molecules. Thesis, Univ. Colorado, https://doi.org/10.1007/978-3-319-98107-9 (2018). molecules. Phys. Rev. Lett. 107, 115301 (2011).
33. Shvarchuck, I. et al. Bose–einstein condensation into nonequilibrium states studied by 48. Potter, A. C., Berg, E., Wang, D. W., Halperin, B. I. & Demler, E. Superfluidity and
condensate focusing. Phys. Rev. Lett. 89, 270404 (2002). dimerization in a multilayered system of fermionic polar molecules. Phys. Rev. Lett. 105,
34. Hueck, K. et al. Two-dimensional homogeneous Fermi gases. Phys. Rev. Lett. 120, 060402 220406 (2010).
(2018). 49. Yao, N. Y. et al. Many-body localization in dipolar systems. Phys. Rev. Lett. 113, 243002 (2014).
35. Ni, K.-K. et al. A high phase-space-density gas of polar molecules. Science 322, 231–235 50. Barbiero, L., Menotti, C., Recati, A. & Santos, L. Out-of-equilibrium states and
(2008). quasi-many-body localization in polar lattice gases. Phys. Rev. B 92, 180406 (2015).
36. Quéméner, G. & Bohn, J. L. Strong dependence of ultracold chemical rates on electric 51. Zinner, N. T. & Bruun, G. M. Density waves in layered systems with fermionic polar
dipole moments. Phys. Rev. A 81, 022702 (2010). molecules. Eur. Phys. J. D 65, 133–139 (2011).
37. Bohn, J. L., Cavagnero, M. & Ticknor, C. Quasi-universal dipolar scattering in cold and 52. Peter, D., Müller, S., Wessel, S. & Büchler, H. P. Anomalous behavior of spin systems with
ultracold gases. New J. Phys. 11, 055039 (2009). dipolar interactions. Phys. Rev. Lett. 109, 025303 (2012).
38. Ticknor, C. Quasi-two-dimensional dipolar scattering. Phys. Rev. A 81, 042708 (2010).
39. Bohn, J. L. & Jin, D. S. Differential scattering and rethermalization in ultracold dipolar Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
gases. Phys. Rev. A 89, 022702 (2014). published maps and institutional affiliations.
40. Aikawa, K. et al. Anisotropic relaxation dynamics in a dipolar Fermi gas driven out of
equilibrium. Phys. Rev. Lett. 113, 263201 (2014). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods we extract τ = 4.9(0.2). For the T/TF data in the paper we thus use the
closest estimate τ = 5 ± 1, where the uncertainty accounts for possible
Experimental protocol systematic errors arising from non-uniform conversion of Rb to KRb
The experiment starts with an ultracold atomic mixture of 40K and and variation of evaporation efficiency between the layers (since the
87
Rb, held in the ODT at a magnetic field of 555 G. The trap frequencies density, and hence thermalization rate, varies between the layers).
for Rb in the ODT are (ωx, ωy, ωz) = 2π × (40, 180, 40) Hz. The atomic To determine the 2D density for the measurement of β, we need to use
mixture is then transferred into a single layer of the LSL. The LSL beams a time-averaged layer number that considers the density dependence
propagate at a shallow angle of 4 degrees along z, resulting in a lattice of the loss in each layer3. The layer-averaged 2D density is defined as
spacing of 8 µm along y. At the end of the LSL ramp, we decrease the n = N/(4πσ2τ), where σ is the root-mean-square cloud size in the radial
ODT power, so that the trap frequencies for Rb in the combined trap are direction. Through numerical simulation, we obtain the decay over time
(ωx, ωy, ωz) = 2π × (25, 600, 25) Hz. Typically, we have 4.1 × 105 K atoms for a cloud with the layer distribution plotted in Extended Data Fig. 1
and 7.0 × 104 Rb atoms at T = 115(10) nK. About 30% of Rb is condensed. and compare it with the decay of a gas with a uniform layer distribution
At this point, we load the mixture into the VL and adjust the ODT in and same number N. In this case, we define τ as the value for which β
order for the KRb molecules to experience radial trap frequencies in the uniform case matches β in the non-uniform case. For Extended
at zero electric field of (ωx, ωz) = 2π × (40, 40) Hz, with ωy/(2π) rang- Data Fig. 1, we obtain τ = 8 ± 1.
ing from a few kilohertz up to 20 kHz. To compensate for the limited
transmittivity of the ITO plates at the 1.064-µm VL wavelength and to Electric field potential
avoid spurious superlattices, the VL beams have a 11-degree tilt with The anti-trapping potential that is used for dipolar evaporation intro-
respect to y, resulting in a lattice spacing of 540 nm. duces an anti-curvature that changes the trap frequency ωx as a function
To create molecules, we first sweep the magnetic field adiabatically of γ. Owing to the geometry of our electrodes, when γ = 0.4225 the
through the KRb heteronuclear Feshbach resonance at 546.6 G. The electric field potential at the molecule position is as homogeneous
magnetic field is ramped from 555 G to 545.5 G in 4 ms, creating 2.5 × 104 as possible (fourth-order cancellation of the electric-field curvature).
Feshbach molecules that are subsequently transferred to the absolute By increasing (decreasing) γ, ωx decreases (increases), as shown in
KRb ground state by stimulated Raman adiabatic passage (STIRAP)1. Extended Data Fig. 2 at EDC = 5 kV cm−1. Our results follow the expected
By tuning the Raman lasers, we create KRb molecules at 0 kV cm−1 or trend from finite-element simulations of the combined electro-optical
4.5 kV cm−1. For molecule creation at 4.5 kV cm−1, the field is ramped to potential at different γ.
the target value 10 ms before the Feshbach sweep. The STIRAP one-way
transfer efficiency is 85(2)% at 0 kV cm−1 and 82(3)% at 4.5 kV cm−1. We Electric-field evaporation ramps
do not observe any dependence of β and Sevap on the initial value of For the evaporation experiments, we lower the trap depth by increas-
the electric field. ing γ over time. The trap depth at each time point of the evaporation
is estimated by simulations of the combined electro-optical potential,
which is benchmarked with the measurement of ωx versus γ displayed in
Matter-wave focusing and layer counting
Extended Data Fig. 2. For the data shown in Fig. 4, the evaporation ramp
The VL layer spacing of 540 nm is too small to resolve with conventional
takes about 800 ms and EDC, γ and trap depth evolve over time according
absorption imaging. To quantify the number of occupied layers τ, we
to the plots in Extended Data Fig. 3. We also plot the trend of the param-
use a matter-wave technique that maps the in situ density distribution
eter η and T/TF at different time points of the evaporation sequence.
onto the momentum distribution, which can then be imaged in time
of flight. To do so, we instantaneously release the cloud from the VL Thermometry of 2D Fermi gas
and the LSL into the ODT. The cloud expands into the ODT harmonic The 2D in situ density n of the molecular Fermi gas is given by the Fermi–
potential for a quarter of the oscillation period along y. This corre- Dirac distribution54:
sponds to a 90-degree rotation in phase space. As a result, after the
rotation, the momentum distribution along y in time of flight corre- 1   1 1 
sponds to the original in-situ density profile. Increasing the time of n(x , z) = − 2
Li1 −exp  − βth  m ω 2x x 2 + m ω 2z z 2 − µ , (3)
λ dB    2 2 
flight increases the layer separation until they can be resolved optically.
1
From a set of averaged matter-wave density profiles, we obtain a his- where m is the molecular mass, βth = k T , µ the chemical potential,
B
togram with the normalized particle number per layer from which we 2
λ dB = 2πħ βth /m the thermal de Broglie wavelength, and Lin the
extract the number of layers τ. We perform this analysis on a cloud of polylogarithmic function of order n. The column-integrated density
Rb atoms without K, eliminating the K–Rb interactions during the phase profile is:
space rotation time. The Rb cloud used for matter-wave amplification
imaging has the same trap parameters, number and temperature of 1 2π    1 
the Rb cloud used for the molecule experiment. When the K–Rb inter- nint(x) = − 2 Li3/2 −exp βth µ − mω 2x x 2 . (4)
λ dB βthmω 2z    2 
actions are removed by setting the magnetic field to the zero-crossing
of the Feshbach resonance, the full contrast is restored. For the data After a certain time of flight t, the x coordinate scales by a factor
in Fig. 1c, obtained by averaging 20 matter-wave images of the Rb cloud, 1/ 1 + ω 2x t 2. The density is consequently divided by 1 + ω 2x t 2 for pro
the density histogram is shown in Extended Data Fig. 1. For a fixed mol- per renormalization. The chemical potential µ is defined through the
ecule distribution, the definition of τ depends on the physical quantity relation:
being calculated. Using τ = N /⟨Ni ⟩ , where ⟨Ni ⟩ is the average particle
2
number per layer over the measured distribution, we extract τ = 4.6(2) N  kBT 
= Li (− e β th µ). (5)
for the data in Extended Data Fig. 1. Theoretical modelling53 for the Rb τ  ħωR  2
cloud in the same conditions yields a consistent value of τ = 5.1(2).
Our measurements of the molecular cloud thus involve averaging Combining equation (5) with the definition of TF in 2D, we obtain:
over layers that are not equally populated. To determine T/TF, we use
1/2 2
T 
the layer-averaged Fermi temperature TF =
2
ħω R
kB
2 Ni =
ħω R
kB ( )
2N
τ
, T  = −
 F 2 Li
1
(− e β th µ)
, (6)
2
where τ = N / Ni is an effective number of layers that accounts for the
nonlinear dependence of TF on Ni. For the data in Extended Data Fig. 1, which allows us to extract the ratio T/TF from the polylogarithmic fit.
From the Gaussian fit to the whole cloud, we obtain the Gaussian Gaussian fit on the outer wings is performed with an excluded region
width σ and a release temperature Trel, defined as: of 1.5σ.
mω 2x σ 2
Trel = . (7) Data availability
kB(1 + ω 2x t 2)
The data that support the findings of this study are available from the
The release temperature Trel is proportional to the energy density corresponding author upon reasonable request. Source data are pro-
U of the Fermi gas, U = 2kBTrel, which saturates to a non-zero value as vided with this paper.
T → 0. In contrast, the energy density Ucl = 2kBT of a classical gas
approaches zero as T → 0. 53. Dalfovo, F., Giorgini, S., Pitaevskii, L. P. & Stringari, S. Theory of Bose–Einstein
When the Gaussian fit is constrained to only the outer wings (that is, condensation in trapped gases. Rev. Mod. Phys. 71, 463–512 (1999).
54. Inguscio, M., Ketterle, W. & Salomon, C. Ultra-cold Fermi gases. In Proceedings of the
the high-momentum states) of the cloud, we can extract a new width
International School of Physics ‘Enrico Fermi’ (2007).
σout from which, using equation (7), we obtain a corrected temperature
Tout through the relation:
Acknowledgements We acknowledge funding from NIST, DARPA DRINQS, ARO MURI and NSF
Phys-1734006. We thank J. L. Bohn, A. M. Kaufman, and C. Miller for careful reading of the
mω 2x σ out
2
manuscript and T. Brown for technical assistance.
Tout = . (8)
kB(1 + ω 2x t 2)
Author contributions All authors contributed to carrying out the experiments, interpreting the
As the excluded region from the centre of the Gaussian fit is results, and writing the manuscript.
expanded, Tout decreases from an initial value of Trel and approaches

T. This is shown in Extended Data Fig. 4, where we plot the ratio
Tout/Trel at different exclusion regions in units of σ. For the range of Additional information
T/TF studied here, we find that an exclusion region of 1.5σ leaves us Correspondence and requests for materials should be addressed to G.V. or J.Y.
Peer review information Nature thanks Georgy Shlyapnikov and the other, anonymous,
enough signal-to-noise ratio for the fit to properly converge and to reviewer(s) for their contribution to the peer review of this work.
return a value of Tout that is only 5% higher than T. In the main text, the Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Layer occupancy. Histogram of the average number per
layer (relative population) for the data shown in Fig. 1c.
Extended Data Fig. 2 | Trend of ω x /(2π) versus γ. Grey points are the
experimental measurements at EDC = 5 kV cm−1, the solid grey line is a linear fit to
guide the eye, and the dashed line is the prediction (Sim) from the
finite-element model. All error bars are 1 standard deviation of the mean.
Article
Extended Data Fig. 3 | Evaporation sequence. a, Ramp in EDC. b, Ramp in γ. temperature at each time point. e, Evolution of T/TF during the ramp. All error
c, Trap depth versus time from the finite-element model of electro-optical bars are 1 standard error of the mean.
potential. d, Evolution of η, calculated by taking the ratio of the trap depth and
Extended Data Fig. 4 | Fermi gas thermometry. Trend of Tout/Trel as a function
of the excluded region from the centre of the Gaussian fit for T/TF = 0.81(15)
(orange diamonds) and T/TF = 2.0(1) (black circles). Solid lines are Gaussian fits
to simulated density profiles for T/TF = 2.0 (black) and T/TF = 0.8 (orange). All
error bars are 1 standard error of the mean.
Article
Operation of an optical atomic clock with a

Brillouin laser subsystem
https://doi.org/10.1038/s41586-020-2981-6 William Loh1,3 ✉, Jules Stuart1,2,3, David Reens1, Colin D. Bruzewicz1, Danielle Braje1,
John Chiaverini1, Paul W. Juodawlkis1, Jeremy M. Sage1,2 & Robert McConnell1
Received: 16 January 2020
Accepted: 1 October 2020

Microwave atomic clocks have traditionally served as the ‘gold standard’ for precision
measurements of time and frequency. However, over the past decade, optical atomic
Check for updates
clocks1–6 have surpassed the precision of their microwave counterparts by two orders
of magnitude or more. Extant optical clocks occupy volumes of more than one cubic
metre, and it is a substantial challenge to enable these clocks to operate in field
environments, which requires the ruggedization and miniaturization of the atomic
reference and clock laser along with their supporting lasers and electronics4,7,8,9. In
terms of the clock laser, prior laboratory demonstrations of optical clocks have relied
on the exceptional performance gained through stabilization using bulk cavities,
which unfortunately necessitates the use of vacuum and also renders the laser
susceptible to vibration-induced noise. Here, using a stimulated Brillouin scattering
laser subsystem that has a reduced cavity volume and operates without vacuum, we
demonstrate a promising component of a portable optical atomic clock architecture.
We interrogate a 88Sr+ ion with our stimulated Brillouin scattering laser and achieve a
clock exhibiting short-term stability of 3.9 × 10−14 over one second—an improvement
of an order of magnitude over state-of-the-art microwave clocks. This performance
increase within a potentially portable system presents a compelling avenue for
substantially improving existing technology, such as the global positioning system,
and also for enabling the exploration of topics such as geodetic measurements of the
Earth, searches for dark matter and investigations into possible long-term variations
of fundamental physics constants10–12.
The ability to precisely measure time with a portable system has long unrivalled narrow linewidths of <1 Hz under high-vacuum operation,
been essential to navigation. The necessity for accurate and portable but otherwise are unwieldy and prone to vibration4,7,16,17.
timekeeping inspired the development of Harrison’s marine chronom- A promising portable alternative to these BCS lasers has recently
eter nearly 300 years ago and continues to this day, reflected in modern emerged via generation of stimulated Brillouin scattering (SBS) light
societal reliance on the global positioning system (GPS). Recently, in an ultrahigh quality factor (Q) resonator18–27. In comparison to the
optical atomic clocks, operating at frequencies of hundreds of tera- BCS laser, the SBS laser offers the advantages of reduced cavity volume,
hertz, have achieved performance far surpassing that of the best micro- operation without vacuum, the ability to be rigidly mounted to any flat
wave clocks and have substantially advanced the precision with which surface, and a potentially higher tolerance to vibration. Despite these
time—and, equivalently, distance—can be measured. However, portable properties, the application of the SBS laser to state-of-the-art atomic
implementations of these optical clocks will require substantial modi- physics has yet to be demonstrated, primarily owing to the laser’s sub-
fications to the existing clock architecture, including miniaturization stantial frequency drift in response to temperature change21. Here, we
of the clock laser whose performance is central to the operation of the overcome the challenges of drift by applying a recently developed
clock. A key challenge is to maintain the frequency stability of the clock technique28 to sense and control the SBS laser’s long-term frequency
laser while reducing its size. This stability is required for the laser to fluctuations in the regime of a few hundred hertz. We utilize the stabi-
(1) remain within the vicinity of the atom’s narrow-linewidth transi- lized SBS light in the demonstration of a strontium-ion optical clock,
tion during the period of time before the clock feedback is engaged which breaks the long-standing clock paradigm of requiring a
(minutes) and (2) remain locked to the atomic transition between feed- BCS laser to serve as the master oscillator. Through a clock self-
back cycles (milliseconds). These stringent requirements have thus comparison measurement, we achieve a fractional frequency stability
far eliminated all possible candidates for clock interrogation apart of 3.9 × 10−14 / τ (with the interrogation time τ in s), beyond the
from bulk-cavity-stabilized (BCS) lasers13–15, which exhibit currently short-term stability achievable by the best microwave clocks29 and
Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA. 2Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA. 3These authors
1
contributed equally: William Loh, Jules Stuart. ✉e-mail: William.Loh@ll.mit.edu

a Resonator input b Resonator output c Feedforward d Output after AOM e Lock to ion
88Sr+
~180 MHz ~180 MHz
Pol. beat Correction Stabilized SBS out
Pump 2 Pump 1 SBS 2 SBS 1 signal
×39
×
Stable SBS
f f f f
1,348.0 nm 1,348.1 nm 180
80 MH
MHz 1,348.1 nm
f SBS laser subsystem

PID
LO
PD 3
PID
PD 1 g
EOM
SBS 1
SOA
SO SBS 1 PBS
Pump SBS out Rot.

Rot
t.
nm)
(1,348.1 nm)
PD 5
AOM SOA R
Res.
SBS 2
SBS 2
Rot.
EOM
LO PID
PD 4 h Feedforward Frequency
PD 2 SBS out AOM SOA doubler To Sr ion
(1,348.1 nm)
SBS 1 To measurement
PID
Rot. 200 MHz
Servo 180 MHz 30 MHz
PLL
PD 5 Servo out LO
SBS 2 PID 30 MHz
VCO
LO
LPF
30 MHz VCO Ramp

DC amp.
Fig. 1 | SBS laser setup and stabilization scheme. a–e, Illustration of the steps stabilization is applied to the SBS laser and will be continued further in h.
(a–e) required for SBS stabilization. Two orthogonal polarization SBS signals g, Photograph of the SBS resonator resting within a copper enclosure and
are generated whose beat note is applied to an acousto-optic modulator (AOM) placed next to a quarter dollar. The SBS resonator including the enclosure and
to compensate for the SBS laser’s frequency drift. See text for details. lid occupies a total space of 3.1 inch × 3.5 inch × 1 inch. h, Diagram of the SBS
f, Diagram of the SBS laser setup. The AOM, electro-optic modulator (EOM), stabilization circuitry. The beat note of two SBS lasers is first converted to a
semiconductor optical amplifier (SOA), polarization rotator (Rot.), polarization voltage signal through a voltage-controlled oscillator (VCO) operating in a
beam splitter (PBS), photodetector (PD), local oscillator (LO) and proportional- phase-locked loop (PLL). Afterwards, a linear ramp subtraction, low-pass filter
integral-derivative (PID) controller enable the independent locking of a pump (LPF) and factor of 39 multiplication is applied before this signal is used for
laser to two orthogonal polarization modes of the resonator (Res.). Two correcting the SBS laser drift. The dashed box indicates the elements that
counter-propagating SBS lasers (SBS 1 and SBS 2) are created from the pump comprise the PLL.
lasers. The blue shaded area designates the region where feedforward
close to an order of magnitude away from state-of-the-art laboratory This correction brings the SBS laser’s stability to a regime where it can
single-ion clocks3. be used to interrogate a 88Sr+ ion optical clock (Fig. 1e).
For our clock demonstration, we interrogate a 88Sr+ ion using a The SBS laser (Fig. 1f) consists of a single pump laser whose output at
fibre-cavity SBS laser with radiation at 1,348 nm that is frequency dou- 1,348 nm is split into two separate paths (see Methods section ‘SBS laser
bled to reach the narrow-linewidth 88Sr+ clock transition at 674 nm setup’). On one path, an acousto-optic modulator (AOM) is used both
(ref. 30). Figure 1a–e depicts our overall strategy for achieving a stable SBS to shift the light by approximately 180 MHz and to enable independent
optical atomic clock. Starting from two orthogonally polarized pump control of its frequency. Afterwards, the polarization of one of the pump
lasers input into a high-Q optical-fibre resonator (Fig. 1a), the resona- beams is rotated by 90°, and the two pump beams are then sent into the
tor generates two orthogonally polarized SBS outputs that are Stokes SBS resonator in opposite directions to accomplish Pound–Drever–Hall
shifted from their respective pumps by about 12.5 GHz (Fig. 1b). The two (PDH) locking35 to the resonator’s two orthogonal polarization modes.
SBS lasers interfere on a photodetector to produce a 180-MHz beat note The SBS resonator itself exhibits a loaded Q of 1.9 × 108 and consists
(Fig. 1c), the frequency deviation of which predominantly corresponds of 2 m of optical fibre wound around a 2-inch-diameter mandrel that
to the temperature drift of the SBS resonator28. This method of tem- rests within a 3.1 inch × 3.5 inch × 1 inch temperature-controlled cop-
perature sensing is based on techniques that connect the differential per enclosure (Fig. 1g). The two pump lasers each generate their own
temperature sensitivity between two orthogonal polarization resonator counter-propagating SBS beams, which upon leaving the resonator are
modes to a measurement of temperature change31–33. However, owing to both coupled out of the system through a pair of circulators. A portion
issues associated with both the detection and control of small shifts in of the SBS light is also monitored and used to stabilize the amplitude
temperature, the best prior stabilization efforts have so far been limited of the SBS laser. Past the circulator, the outputs of both orthogonal
to a range of about 10 μK (ref. 34). Here, as a consequence of the excep- polarization SBS lasers are interfered on a photodetector, which serves
tionally narrow SBS lasing linewidth, the resolution of our temperature as our method for sensing temperature change.
sensor reaches below 100 nK. In order to circumvent the need for direct Figure 1h presents the feedforward circuitry used in the stabilization
control of temperature, we apply the dual-polarization beat note (Pol. of the SBS laser (see Methods section ‘Feedforward implementation’).
beat) as a feedforward correction to the SBS laser’s frequency (Fig. 1d). After the 180-MHz SBS temperature error signal is generated, a series

Article
a Feedforward AOM
Free-running SBS
Subtraction
SBS 1
×39
SBS 2 Pol. beat ×39
Res. PD 5
b 2.0 c 20
Free-running SBS Free-running SBS
1.5 Pol. beat ×39 Pol. beat ×39
Frequency drift (MHz)
10
Frequency drift (kHz)

Subtraction
1.0
0
0.5
–10
0
–0.5 –20
0 4 8 12 0 4 8 12
Time (min) Time (min)
d 20 e 10–8 10–3
Free-runniing SBS
Free-running
F
10–9 Drift-subtracted
D
Drift-subtr
rracted SBS 10–4
10
10–10 Drift regime 10–5

Drift-subtracted SBS
ΔT (K)
0 10–11
Δf/f
10–6
220 Hz
10–12 10–7
–10
10–13 22 Hz 10–8
τ
Ideal 1/√
–20 10–14
0 4 8 12 10–3 10–2 10–1 100 101 102
Time (min) Time (s)
Fig. 2 | SBS laser drift cancellation procedure. a, Simplified diagram through the linear ramp and temperature correction procedure. e, Fractional
illustrating the correction of the SBS frequency drift via subtraction of the frequency (Δf/f ) noise measurements and corresponding extracted
free-running SBS laser and a frequency multiplied (×39) polarization beat temperature deviation (ΔT) of the SBS laser. The green shaded section
signal (Pol. beat ×39). b, Experimental time series of the free-running SBS laser indicates the region where the SBS laser noise is gradually averaging down.
(red) and the polarization beat-note (blue) drift. A numerical subtraction of the We lock the SBS laser to the ion at these timescales for the demonstration
two (black) produces a residual linear drift. c, Numerical removal of the linear of the optical atomic clock. The blue section indicates where the SBS laser
drift from the free-running SBS and polarization beat-note time series experiences drift that can be mostly compensated for via subtraction. The
revealing the underlying temperature-induced frequency drift. The excellent red shaded section indicates where the SBS laser drift dominates. After
agreement between the two demonstrates that the correction signal subtraction, the resulting noise of the SBS laser reveals a short-term frequency
accurately tracks the SBS temperature drift. d, Subtraction of the SBS and variation of 22 Hz and a long-term frequency drift of 220 Hz, which can be
polarization beat-note signals in Fig. 2c. The drift is numerically cancelled further reduced by locking to the ion.
of operations is applied to prepare this signal to serve as a correction plate. The beat note between two orthogonal polarization SBS lasers,
to the SBS laser. Using a phase-locked loop (PLL), the temperature which is used as a sensor for temperature change, is subtracted from
error is first converted to a voltage that is low-pass filtered and ampli- one free-running SBS laser to stabilize its frequency drift. To dem-
fied before it is converted back again to a radio-frequency (RF) signal. onstrate this procedure, both the frequency of the free-running SBS
A net frequency multiplication factor of 39 is achieved through this laser’s beat note with a reference BCS laser and the frequency of the
process, which enables the temperature error to precisely match and dual-polarization beat note (see Methods section ‘Measurement of
cancel the SBS frequency drift. The subtraction of a ramp signal also the SBS laser noise’) are experimentally measured at time intervals of
enables the ability to compensate for a residual linear SBS drift. After 1 ms and plotted for the wavelength of 1,348 nm over a period of 12 min
correction, the SBS laser output is frequency doubled to reach 674 nm (Fig. 2b). The polarization beat is also numerically multiplied by the
for interrogation of the 88Sr+ ion. Together, Fig. 1f–h combined comprise empirically determined correction factor of 39 in order to account
the SBS laser subsystem whose output is directed onto a 88Sr+ ion for for the common-mode suppression of the extracted temperature
operation of an optical atomic clock. shifts. The resulting multiplied signal serves as the feedforward cor-
Figure 2a demonstrates the feedforward procedure for correcting the rection to temperature drift, which on numerical subtraction from the
SBS laser’s drift. In comparison to the feedback approach of ref. 28, the free-running SBS laser yields a residual linear drift of 160 kHz min−1.
use of feedforward enables direct control of the SBS laser’s frequency This linear drift results from the individual linear drifts of the SBS
with advantages in avoiding the slow servo response of controlling and polarization beat signals, which are in opposing directions for
temperature at the fibre core and also in circumventing unintentional the case of Fig. 2b, and represents a parasitic frequency shift that our
length shifts that arise from thermal expansion in the underlying copper temperature sensing technique cannot account for. We attribute the

a b Fig. 3 | SBS laser subsystem
500
stabilization. a, Measurement of the
Feedforward on
440 SBS laser’s frequency drift (solid line)

400 with and without feedforward
420
stabilization. The measurement interval
300 400 was 1 ms. b, Zoomed-in plot of the SBS
frequency drift. The average long-term
380
200 frequency deviation is 310 Hz. c, Measured
360 lineshape of the stabilized SBS laser
7 9 11 13 15 (red solid line) and corresponding Voigt
100 Time (min)
Free fit with R 2 = 0.93 (dashed line). The
Stabilized
running measured linewidth (Δf ) is 50 Hz The
0 spectrum is taken with a sweep time of
0 5 10 15
Time (min) 225 ms and a resolution bandwidth of
20 Hz. d, Fractional frequency(Δf/f )
c 10 d 10–11 noise comparison between the stabilized
SBS laser (red line) and the ideal
0 drift-cancelled SBS laser of Fig. 2e. The
Δf = 50 Hz 170 nK
Normalized power (dB)
stabilized SBS exhibits a short-term

–10 10–12
frequency variation of 48 Hz. Long-term
temperature stabilization at the level of
Δf/f
–20 170 nK is also experimentally achieved.
–30 10–13 48 Hz
–40 Stabilized SBS

Fig. 2e comparison
–50 10–14
–1.0 –0.5 0.0 0.5 1.0 10–3 10–2 10–1 100 101 102
Offset frequency (kHz) Time (s)
observed linear drift to a slow relaxation of the SBS resonator over the feedforward correction is applied, the SBS frequency drift flattens
time36 (see Methods section ‘SBS laser residual linear drift’), which is to a value near zero, as indicated by the horizontal dashed line guide of
projected to equilibrate on a timescale of months. Rather than per- Fig. 3a. A zoomed-in trace of the stabilized SBS frequency drift (Fig. 3b)
forming this subtraction step first, we instead numerically remove the shows close agreement to Fig. 2d (accounting for the frequency dou-
linear drift from the SBS and polarization-beat traces, which reveals bling), and indicates that our experimental implementation of the
the underlying temperature-induced drift of the SBS laser frequency feedforward stabilization accurately compensates for both linear and
(Fig. 2c). The correction signal, which now occupies a span of 30 kHz temperature-induced SBS laser drift.
at the wavelength of 1,348 nm, shows excellent agreement with the The lineshape of the SBS laser subsystem (Fig. 3c), measured at
remaining SBS laser drift for cancellation. With the linear drift removed, 674 nm via beating with a BCS laser, demonstrates further the laser’s
a subtraction of the polarization-beat frequency from the SBS laser exceptional short-term noise. The spectrum exhibits a linewidth of
frequency yields a stabilized SBS laser frequency with about 220-Hz 50 Hz. This value of linewidth is confirmed by the measured fractional
frequency fluctuations (Fig. 2d). frequency noise of the stabilized SBS laser (Fig. 3d), which reaches a
Figure 2e shows the fractional frequency noise, that is, the minimum of 1.1 × 10−13 at 60 ms and corresponds to a frequency devia-
root-mean-square (r.m.s.) frequency shift of the SBS laser normalized tion of 48 Hz at 674 nm. At long timescales, the noise level becomes
to the laser’s centre frequency, derived from the time series traces of 1.4 × 10−12 (620 Hz), which is slightly larger than the frequency excur-
Fig. 2b–d. The free-running 1,348-nm SBS laser reaches a minimum sions found with the numerical drift-cancelling procedure of Fig. 2e.
noise level of 1.4 × 10−13 (corresponding to a linewidth of 30 Hz) at We attribute this difference in drift to noise in the electronics used
10 ms but becomes unbounded at longer timescales and increases to for feedforward stabilization. At the level of 620 Hz, our achieved fre-
5.8 × 10−10 at 100 s. When the measured drift is subtracted to account for quency drift corresponds to temperature stabilization of the SBS laser
both the linear and temperature-induced drift, the SBS laser maintains subsystem at a level below 170 nK.
its performance of 1.0 × 10−13 (22 Hz) at short timescales and experi- To demonstrate experimentally the practical capability of the SBS
ences a >500-fold reduction in drift over the long term. The SBS laser laser, we use the subsystem to run an atomic clock, stabilizing the laser
frequency excursions become bounded at the value of 1.0 × 10−12 or to the narrow-linewidth S1/2 ↔ D5/2 quadrupole clock transition in 88Sr+
220 Hz, which corresponds to temperature stabilization of the SBS (0.4 Hz natural linewidth). We interrogate a single strontium ion con-
laser at a level below 120 nK. fined 50 μm above the surface of a microfabricated surface-electrode
We experimentally implement the numerical feedforward procedure trap within a cryogenic ultrahigh-vacuum apparatus37. As shown in
of Fig. 2 and apply the corrections onto the SBS laser’s frequency via Fig. 4a, the clock interrogation light is amplified through a series of
an AOM (Fig. 3a). The linear drift is accounted for by a ramp generator, injection-locked lasers followed by tapered amplifiers, with fibre-noise
and the factor of 39 multiplication is implemented through a PLL. The cancellation stages to mitigate phase noise picked up as the clock laser
stabilized output is frequency doubled to 674 nm and is measured is routed between and across rooms within optical fibre; a single flipper
against a reference BCS laser for characterization of the SBS laser drift. mirror allows us to select either an existing BCS laser (see Methods sec-
For the remainder of this Article, we will refer to the frequency-doubled tion ‘Characteristics of the BCS laser’) or the SBS laser subsystem as the
and amplified SBS laser as the ‘SBS laser subsystem’, whose output at initial seed for the injection stages. To maximize the coherence time of
674 nm is used to interrogate the clock ion. The SBS laser subsystem, the ion’s optical transition, we employ a system of passive magnetic field
operating at 674 nm, starts initially in a free-running state for the first stabilization using persistent superconducting currents38 and active
6.5 min of measurement and drifts by 400 kHz within this time. Once laser-vibration compensation using an interferometric scheme similar

Article
a b
0.9
BCS
Ground state probability

0.7
AOM ×2
0.5
SBS subsystem Injected laser
Tapered
amplifier
0.3
AOM 0.1
Cryogenic 0 5 10
vacuum chamber Probe phase (rad)
c d 10–12
1.0
Ground state probability
10–13
0.9 3.9 × 10–14/√

W
Δf/f
Δf = 370 Hz 10–14
0.8
10–15
0.7 10–16
–2 –1 0 1 2 10–1 100 101 102 103
Detuning (kHz) Time (s)
Fig. 4 | SBS laser optical clock. a, Schematic of laser beam path to the clock interleaved with frequency corrections to keep the SBS laser subsystem locked
chamber. A flipper mirror allows the SBS laser subsystem to be interchanged to the atomic transition. The ground state probability is measured as a function
with a BCS laser as the master frequency source. The injected laser and tapered of frequency detuning relative to the clock transition. The vertical blue error
amplifier represent a series of two injection locking and amplification stages, bars represent 1σ error derived from photon counting statistics. d, Fractional
respectively. An AOM is used to finely adjust the laser frequency before the frequency (Δf/f ) noise of the difference frequency between interleaved clocks.
beam is focused onto an ion inside the cryogenic vacuum chamber used as the The measured fractional frequency noise is divided by 2 to estimate the error
clock chamber. b, Measured Ramsey fringes on the clock transition. Two π/2 of a single clock, assuming even distribution of error in the correction signals.
pulses are applied around a τ = 1 ms interrogation period. The ground state The blue points represent the frequency noise at a selection of averaging times,
probability is measured as a function of the phase of the probe pulse. and the vertical blue bars indicate the 1σ error in this calculation46. From a fit to
c, Spectroscopy of the |5S1/2, m J = −1/2⟩ → |4D5/2, m J = −3/2⟩ transition in 88Sr+ these data (dashed red line) assuming a purely white noise spectrum, we obtain
measured with the SBS laser subsystem. Spectroscopic measurements are the function 3.9 × 10−14/ τ .
to fibre noise cancellation39. With these techniques, we perform a series during the interrogation time result in an additional phase shift and
of Ramsey experiments and deduce a coherence time of 2.9 ms with the are thus mapped to the ion’s state distribution. Between interrogation
BCS laser, which we take as an upper bound for the performance of the cycles, the state of the ion must be detected and then re-initialized
clock experiment. Assuming that frequency fluctuations in the laser into the lower clock level; this leads to a 1.85-ms dead time, during
are the dominant decoherence mechanism, we infer an effective laser which the system is insensitive to frequency fluctuations of the laser
linewidth of 1/(2π × 2.9 ms) = 55 Hz, which is close to the SBS laser sub- (see Methods section ‘Clock protocol and simulation’ for more details
systems’s linewidth in Fig. 3c. Thus we conclude that uncompensated of the lock procedure). As a demonstration of the efficacy of the atomic
noise in our experiment (for example, from optics subject to acoustic lock, we perform Rabi spectroscopy of the clock transition using the
vibrations or magnetic field instability at the ion) contributes at about SBS laser subsystem, interleaved with frequency corrections follow-
the same level as noise from the SBS laser subsystem (see Methods ing the protocol above (Fig. 4c). These spectroscopic measurements
section ‘Limits of the optical clock measurement’ for discussion of are performed with pulse time of 10 ms (to avoid Fourier broadening
noise sources). All together, these noise sources limit the clock inter- of the line), which leads to a longer clock dead time of about 11 ms for
rogation time we achieve, and therefore set the ultimate limit in clock these measurements. The width of the measured feature, 370 Hz, is
performance. in close correspondence with the simulated value of 250 Hz, derived
An AOM is used both for scanning the SBS laser subsystem over the from numerical application of the clock protocol to the free-running
clock transition and for locking the laser’s frequency to the atomic reso- SBS laser subsystem noise measured in Fig. 3d, given the effective dead
nance. To discipline the SBS laser subsystem to the atomic resonance time of 11 ms. This provides further evidence that noise mechanisms
frequency, we create an error signal via a Ramsey experiment consisting downstream from the laser source do not substantially affect measure-
of two π/2 pulses on the clock transition, separated by an interroga- ments with the SBS laser subsystem.
tion time τ = 1 ms. As the phase of the second π/2 pulse is varied, the In order to assess the stability of the clock when running with the SBS
population in the lower clock state traces out a sine curve (Fig. 4b). laser subsystem, we perform a self-comparison measurement via two
The slope of this signal reaches a maximum when the second pulse is independently operated clock signals generated by interleaving dis-
90° out of phase with the first pulse. Variations in the laser frequency tinct sets of correction signals applied to the laser (see Methods section

‘Clock protocol and simulation’). This technique has been previously 11. Ludlow, A. D., Boyd, M. M., Ye, J., Peik, E. & Schmidt, P. O. Optical atomic clocks. Rev. Mod.
Phys. 87, 637–701 (2015).
used to characterize clock performance when two independent clocks 12. McGrew, W. F. et al. Atomic clock performance enabling geodesy below the centimetre
are not available40,41 and is known to accurately capture the short-term level. Nature 564, 87–90 (2018).
clock stability, including ion projection noise and the Dick effect due 13. Young, B. C., Cruz, F. C., Itano, W. M. & Bergquist, J. C. Visible lasers with subhertz
linewidths. Phys. Rev. Lett. 82, 3799–3802 (1999).
to dead time in the clock protocol. The self-comparison technique is 14. Jiang, Y. Y. et al. Making optical atomic clocks more stable with 10−16-level laser
insensitive to long-term frequency drifts of the ion, which if present stabilization. Nat. Photon. 5, 158–161 (2011).
would be common to both clock signals, but at the same time under- 15. Kessler, T. et al. A sub-40-mHz-linewidth laser based on a silicon single-crystal optical
cavity. Nat. Photon. 6, 687–692 (2012).
estimates the performance of the clock owing to the extra dead time 16. Davila-Rodriguez, J. et al. Compact, thermal-noise-limited reference cavity for
incurred through the interleaving operation. We note that our clock ultra-low-noise microwave generation. Opt. Lett. 42, 1277–1280 (2017).
is interrogated on the magnetically sensitive |5S1/2, mJ = −1/2⟩ → |4D5/2, 17. Didier, A. et al. Ultracompact reference ultralow expansion glass cavity. Appl. Opt. 57,
6470–6473 (2018).
mJ = −3/2⟩ transition, which would limit long-term stability if magnetic 18. Grudinin, I. S., Matsko, A. B. & Maleki, L. Brillouin lasing with a CaF2 whispering gallery
field drifts were not cancelled; however, a simple protocol for 88Sr+ mode resonator. Phys. Rev. Lett. 102, 043902 (2009).
clocks has been developed42 that eliminates both first-order Zeeman 19. Lee, H. et al. Chemically etched ultrahigh-Q wedge-resonator on a silicon chip.
Nat. Photon. 6, 369–373 (2012).
and electric quadrupole shifts of the clock transition and can be imple- 20. Kabakova, I. V. et al. Narrow linewidth Brillouin laser based on chalcogenide photonic
mented in future measurements. chip. Opt. Lett. 38, 3208–3211 (2013).
For a 1-ms interrogation time followed by a 1.85-ms recooling and 21. Loh, W. et al. Dual-microcavity narrow-linewidth Brillouin laser. Optica 2, 225–232
(2015).
state preparation period, the effective dead time for each clock is 4.7 ms 22. Otterstrom, N. T., Behunin, R. O., Kittlaus, E. A., Wang, Z. & Rakich, P. T. A silicon Brillouin
and results in a measured clock stability of 3.9 × 10−14 / τ (Fig. 4d). This laser. Science 360, 1113–1116 (2018).
value of frequency stability agrees well with numerical simulations we 23. Gundavarapu, S. et al. Sub-hertz fundamental linewidth photonic integrated Brillouin
laser. Nat. Photon. 13, 60–67 (2019).
performed of the optical clock using the measured SBS laser subsystem 24. Geng, J. et al. Highly stable low-noise Brillouin fiber laser with ultranarrow spectral
noise, which predict the clock to operate at the level of 4.1 × 10−14 / τ linewidth. IEEE Photonics Technol. Lett. 18, 1813–1815 (2006).
for 4.7 ms of dead time. The good correspondence between measure- 25. Shee, Y. G., Al-Mansoori, M. H., Ismail, A., Hitam, S. & Mahdi, M. A. Multiwavelength
Brillouin-erbium fiber laser with double-Brillouin-frequency spacing. Opt. Express 19,
ment and simulation suggests that under normal clock operation with 1699–1706 (2011).
1.85 ms of dead time, the clock stability would reach 2.5 × 10−14 / τ . 26. Tow, K. H. et al. Linewidth-narrowing and intensity noise reduction of the 2nd order
Stokes component of a low threshold Brillouin laser made of Ge10As22Se68 chalcogenide
The advances made here to the SBS laser bring that laser’s frequency
fiber. Opt. Express 20, B104–B109 (2012).
fluctuations into a new regime that approaches the level of performance 27. Debut, A., Randoux, S. & Zemmouri, J. Linewidth narrowing in Brillouin lasers: theoretical
only currently achievable by BCS lasers. When the SBS laser subsystem analysis. Phys. Rev. A 62, 023803 (2000).
28. Loh, W., Yegnanarayanan, S., O’Donnell, F. & Juodawlkis, P. W. Ultra-narrow linewidth
is used to interrogate an atomic system, the combination offers the
Brillouin laser with nanokelvin temperature self-referencing. Optica 6, 152–159 (2019).
potential for creating portable optical atomic clocks with stability 29. Jefferts, S. R., Meekhof, D. M., Shirley, J. H., Stepanovic, M. & Parker, T. E. Accuracy results
surpassing that of state-of-the-art microwave clocks by an order of from NIST-F1 laser-cooled cesium primary frequency standard. In Proc. IEEE/EIA Int.
Frequency Control Symposium and Exhibition 714–717 (IEEE, 2000).
magnitude or more. For the realization of a truly portable clock, future
30. Madej, A. A., Dubé, P., Zhou, Z., Bernard, J. E. & Gertsvolf, M. 88Sr+ 445-THz single-ion
effort is required in miniaturizing the atomic physics package and in reference at the 10−17 level via control and cancellation of systematic uncertainties and its
maturing the technology of compact frequency combs43 and SBS lasers. measurement against the SI second. Phys. Rev. Lett. 109, 203002 (2012).
31. Strekalov, D. V., Thompson, R. J., Baumgartel, L. M., Grudinin, I. S. & Yu, N. Temperature
Additional benefits in size and vibration tolerance may be gained by
measurement and stabilization in a birefringent whispering gallery mode resonator.
replacing the fibre SBS resonator with an integrated resonator19,23 pend- Opt. Express 19, 14495–14501 (2011).
ing further improvements to the linewidth and stability of chip-based 32. Fescenko, I. et al. Dual-mode temperature compensation technique for laser stabilization
to a crystalline whispering gallery mode resonator. Opt. Express 20, 19185–19193 (2012).
SBS lasers. Although not directly explored in this work, other promising
33. Weng, W. et al. Nanokelvin thermometry and temperature control: beyond the thermal
architectures for ultralow-noise lasers44,45 may also benefit from the noise limit. Phys. Rev. Lett. 112, 160801 (2014).
techniques of temperature stabilization developed here. 34. Lim, J. et al. Probing 10 μK stability and residual drifts in the cross-polarization dual-mode
stabilization of single-crystal ultrahigh-Q optical resonators. Light Sci. Appl. 8, 1 (2019).
35. Drever, R. W. P. et al. Laser phase and frequency stabilization using an optical resonator.
Appl. Phys. B 31, 97–105 (1983).
Online content 36. Storz, R., Braxmaier, C., Jäck, K., Pradl, O. & Schiller, S. Ultrahigh long-term dimensional
stability of a sapphire cryogenic optical resonator. Opt. Lett. 23, 1031–1033 (1998).
Any methods, additional references, Nature Research reporting sum- 37. Bruzewicz, C. D., McConnell, R., Chiaverini, J. & Sage, J. M. Scalable loading of a
maries, source data, extended data, supplementary information, two-dimensional trapped-ion array. Nat. Commun. 7, 13005 (2016).
acknowledgements, peer review information; details of author con- 38. Wang, S. X., Labaziewicz, J., Ge, Y., Shewmon, R. & Chuang, I. L. Demonstration of a
quantum logic gate in a cryogenic surface-electrode ion trap. Phys. Rev. A 81, 062332
tributions and competing interests; and statements of data and code (2010).
availability are available at https://doi.org/10.1038/s41586-020-2981-6. 39. Ma, L. S., Junger, P., Ye, J. & Hall, J. L. Delivering the same optical frequency at two places:
accurate cancellation of phase noise introduced by an optical fiber or other time-varying
path. Opt. Lett. 19, 1777–1779 (1994).
1. Hinkley, N. et al. An atomic clock with 10−18 instability. Science 341, 1215–1218 (2013). 40. Nicholson, T. L. et al. Comparison of two independent Sr optical clocks with 1 × 10−17
2. Bloom, B. J. et al. An optical lattice clock with accuracy and stability at the 10−18 level. stability at 103 s. Phys. Rev. Lett. 109, 230801 (2012).
Nature 506, 71–75 (2014). 41. Nicholson, T. L. et al. Systematic evaluation of an atomic clock at 2 × 10−18 uncertainty.
3. Brewer, S. M. et al. 27Al+ quantum-logic clock with a systematic uncertainty below 10−18. Nat. Commun. 6, 6896 (2015).
Phys. Rev. Lett. 123, 033201 (2019). 42. Dubé, P. et al. Electric quadrupole shift cancellation in single-ion optical frequency
4. Koller, S. B. et al. Transportable optical lattice clock with 7 × 10−17 uncertainty. Phys. Rev. standards. Phys. Rev. Lett. 95, 033001 (2005).
Lett. 118, 073601 (2017). 43. Spencer, D. T. et al. An optical frequency synthesizer using integrated photonics.
5. Godun, R. M. et al. Frequency ratio of two optical clock transitions in 171Yb+ and constraints Nature 557, 81–85 (2018).
on the time variation of fundamental constants. Phys. Rev. Lett. 113, 210801 (2014). 44. Liang, W. et al. Ultralow noise miniature external cavity semiconductor laser.
6. Huntemann, N., Sanner, C., Lipphardt, B., Tamm, Chr. & Peik, E. Single-ion atomic clock Nat. Commun. 6, 7371 (2015).
with 3 × 10−18 systematic uncertainty. Phys. Rev. Lett. 116, 063001 (2016). 45. Zhang, W. et al. Ultranarrow linewidth photonic-atomic laser. Laser Photonics Rev. 14,
7. Leibrandt, D. R., Thorpe, M. J., Bergquist, J. C. & Rosenband, T. Field-test of a robust, 1900293 (2020).
portable, frequency-stable laser. Opt. Express 19, 10278–10286 (2011). 46. Howe, D. A. The total deviation approach to long-term characterization of frequency
8. Liu, L. et al. In-orbit operation of an atomic clock based on laser-cooled 87Rb. stability. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 47, 1102–1110 (2000).
Nat. Commun. 9, 2760 (2018).
9. Takamoto, M. et al. Test of general relativity by a pair of transportable optical lattice Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
clocks. Nat. Photon. 14, 411–415 (2020). published maps and institutional affiliations.
10. Maleki, L. & Prestage, J. Applications of clocks and frequency standards: from the routine
to tests of fundamental models. Metrologia 42, S145–S153 (2005). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods
Measurement of the SBS laser noise
SBS laser setup The SBS laser’s frequency drift, noise and spectrum are all measured
A 1,348-nm external-cavity diode laser with an output of 30 mW serves as by interfering the frequency-doubled SBS output with an independent
the pump of the SBS laser. The pump light is split in a 90:10 ratio between bulk-cavity-stabilized laser operating at 674 nm. After photodetec-
two separate paths that are used to probe the two orthogonal polarization, a 0−200 MHz signal is generated whose noise characteristics are
tion modes of the optical resonator. The 90% path passes through an assumed to all be attributed to the SBS laser. In contrast, the 180-MHz
additional AOM, which equalizes the power between the two paths. dual-polarization beat note is directly produced by separately inter-
Each path is also phase modulated and amplified such that the power fering two orthogonal-polarization SBS signals. For measuring drift,
reaches about 10 mW at the resonator input, which produces approxi- the frequencies of the SBS laser and dual-polarization beat note are
mately 5 mW of SBS light at the output. It is essential to the operation tracked across two frequency counters that are simultaneously trig-
of this SBS laser that the counter-propagating SBS light generated by gered. Alternatively, for measuring the spectrum, the SBS signal is
one pump travels alongside the opposite polarization pump that passes mixed to 5 MHz and directly detected on a spectrum analyser.
through the resonator. The PBS that follows after the resonator then The frequency noise of the SBS laser, before and after feedforward, is
separates the SBS light from that of the orthogonal polarization pump. depicted in Extended Data Fig. 1. The feedforward system adds a slight
The pump light is demodulated and used for PDH locking to the cavity amount of additional noise to the SBS laser including spurious peaks
resonance. 10% of the output SBS power is tapped off and used to servo across the spectrum, but otherwise does not substantially degrade the
the SBS laser’s amplitude, while the remaining 90% is redirected out by SBS laser’s performance. The white-noise floor is slightly increased to
a circulator. The majority of the passive optical fibre components for 7 Hz2 Hz−1. The feedforward-enabled SBS laser linewidth is estimated to
both polarization paths, including circulators, couplers and polarization be 51 Hz, which is found by integrating the noise spectrum48, and agrees
beam splitters, are spliced together and contained on a 5 inch × 5 inch with other linewidth estimations based on a direct spectrum measure-
plate. The resonator itself sits within a copper enclosure comprising a ment and on a fractional frequency noise measurement. After account-
copper platform that rests on a layer of Viton. The enclosure isolates ing for the frequency doubling, this measured SBS laser frequency
the resonator from the environment but permits the ability to control noise is similar in spectral shape and white-noise floor as compared
the resonator’s temperature through the copper. to the 20-Hz-linewidth SBS laser reported in ref. 28.
The output of the circulator is amplified by an SOA and then fre-
quency doubled through a fibre-pigtailed potassium titanyl phos- SBS laser residual linear drift
phate (KTP) waveguide doubler for interrogation of the 88Sr+ ion at The residual linear drift of the SBS laser operating at 1,348 nm is tracked
674 nm. For an amplified SBS power of about 60 mW at the input of the across two separate but otherwise identical resonators in order to
frequency doubler, the 674-nm output after doubling reaches about monitor how the amplitude of this drift changes with time (Extended
60 μW, corresponding to an efficiency of 1.7% W−1. In order to achieve Data Fig. 3). The first resonator starts at a linear drift of 1.6 kHz s−1 and
stable injection locking of a subsequent slave laser, the 674-nm SBS reaches a drift of 0.2 kHz s−1 by the end of 79 days. The second resonator,
power should exceed 10 μW. Through possible future integration of the which is used for the datasets contained in this work, starts at a linear
SBS laser with the frequency doubler, the coupling loss into the doubler drift of 2.8 kHz s−1 and reaches 0.6 kHz s−1 after 13 days. An exponential
may be avoided thereby eliminating the need for one or more of the fit to the plotted data points across both resonators reveals a 1/e time
amplification and/or injection locking stages. This would require the constant of 6.9 days. In all of the cases reported, the temperature cor-
union of an on-chip SBS laser, which has recently been demonstrated rection factor remained constant at 39.
in a low-confinement Si3N4 platform23, with a chip-based frequency In our initial experiments with the SBS laser subsystem, we used a
doubler, which necessitates the use of a χ(2) material such as LiNbO3. BCS laser as a stable reference to monitor the linear drift of the SBS
The integration of these two material systems may be accomplished laser by tracking the beat note between the two optical signals. This
through wafer bonding techniques using tapers that transition the use of the BCS laser was merely a convenience, and is not required for
optical mode from Si3N4 to LiNbO3 (ref. 47). the clock application. When nothing about the drift of the SBS laser
is known, the atom itself can be used as a stable frequency reference.
Feedforward implementation The frequency of the full SBS laser subsystem must first be brought
In order to stabilize the SBS laser via feedforward, a frequency scaling to within a few tens of megahertz of the atomic resonance frequency;
factor of 39 must be applied to the dual-polarization beat note. This this degree of frequency determination is afforded by commercial
factor is tuned to match the long-term frequency movement of the wavelength meters based on Fizeau interferometers. When the laser
polarization beat to the drift of the SBS laser (see Extended Data Fig. 2). frequency is close to the clock transition, we may perform Rabi spec-
Although multiplication by 39 appears to be optimal, the accuracy troscopy by scanning the frequency of the laser using an AOM and
of the correction signal does not degrade substantially for scale fac- search for atomic transitions. There are multiple transitions in the
tors of 39 ± 2. In addition to a frequency scaling, the polarization beat neighbourhood of our chosen clock transition, due to the number of
must undergo (1) a ramp subtraction to cancel a linear component distinct Zeeman sublevels and the fact that each transition is dressed
of the SBS drift and (2) a 0.03 Hz low-pass filtering step that removes with sidebands displaced by multiples of the ion’s trap frequency. With
excess noise at short timescales. These steps are accomplished by first our knowledge of the magnetic field, the laser polarization and the
phase-locking an RF waveform generator (FM deviation = 30 kHz for frequency of the ion’s motion—all of which can be determined without
Vin = ±2.5 V) to the polarization beat (FM, frequency modulation). Owing using the laser—we are able to uniquely identify our clock transition.
to the intrinsically lower noise of the waveform generator at micro- Next, we perform repeated spectroscopy across the clock transition
wave frequencies, the servo output becomes a voltage replica of the (see Extended Data Fig. 4); since the drift of the SBS laser’s frequency
information contained in the polarization beat note. After subtracting is the dominant source of frequency error, any apparent shift in the
a ramp and low-pass filtering the result, the voltage signal is sent to a clock transition is attributed to the SBS laser. See also ref. 49 where a
tunable amplifier with its amplification factor set to 2. The output of the similar method is used to track the drift of a BCS laser.
amplifier is used to control the frequency of a second signal generator An alternative to this method is to simply wait for a long enough time
operating with FM deviation = 1.075 MHz and Vin = ±5 V. The combina- that the laser drift decreases to an acceptable level, as suggested by
tion of the amplification factor along with the chosen FM deviation the decay discussed above. In fact, at the time that the measurements
values produces the scale factor of 39 applied to the polarization beat. in Extended Data Fig. 4 were taken, the drift of the SBS resonator had
decreased to approximately 30 Hz s−1 (at 1,348 nm). We have found that clock states the ion is in, we update a frequency correction applied to
this level of drift does not preclude the operation of the clock—even the ion as follows:
without additional corrections. To obtain the results in Extended Data
Fig. 4, we intentionally applied a linear drift to the SBS laser at a level
1 αmi
Δfi +1 = Δfi + , (1)
approximately equal to the highest drift measured in Extended Data 2π τ
Fig. 3, to demonstrate how one would be able to proceed if this resona-
tor were being used closer to its original construction date. where τ = 1 ms is the interrogation time, α ≈ 0.2 is a gain term optimized
Finally, once the linear drift is brought down to an acceptable level, the to achieve the best stability, and mi represents the measurement result,
error signal of the clock itself can be used to track the frequency of the with mi = +1 (−1) corresponding to finding the ion in the D5/2 (S1/2) state.
SBS laser subsystem with respect to the frequency of the atomic tran- In order to implement our interleaved clock self-comparison, we run
sition. By accumulating the frequency corrections applied to the final two independent control loops according to the above protocol. During
frequency-shifting AOM, the frequency difference between the SBS laser odd-numbered interrogations, we use and update the first frequency
(1)
subsystem and the clock transition can be monitored in real time (see correction Δfi ; during even-numbered interrogations we use and
(2)
Extended Data Fig. 4d). In Extended Data Fig. 4d, we run our clock protocol update the second frequency correction Δfi . See Extended Data Fig. 6
with a reduced interrogation time, which leads to a decreased frequency for an illustration of the timing of the interleaved clocks. After the exper-
resolution but an increased capture range for the lock signal. For future iment is over, the two frequency series Δf(1) and Δf(2) are compared against
operations, we may construct our control software such that a limited ver- one another to compute the Allan deviation of the interleaved clock.
sion of the clock protocol runs between all other calibration routines and The performance of the interleaved clock is compared to a numeri-
experiments. The frequency difference could thus be constantly tracked cal simulation we developed. This simulation models noise of the SBS
and programmatically fed forward into each experimental operation. laser using measured data for timescales of 1 ms and above (taken from
comparisons with a BCS laser) to which is added a white noise term
Characteristics of the BCS laser with spectral density of 7 Hz2 Hz−1 to simulate short-timescale fluctua-
In our laboratory, we have access to an ultra-stable reference cav- tions of the laser frequency. In the simulation, the laser frequency is
ity, consisting of two highly reflective mirrors with a spacer made of updated once per clock cycle based on ion measurements that take
ultra-low-expansion (ULE) glass. The cavity is housed in a high vacuum into account quantum projection noise and dead time effects. Extended
environment inside a chamber that has two independently controlled Data Fig. 7 shows the measured Allan deviation of the unlocked SBS
radiation shields to minimize temperature fluctuations. The vacuum laser, the simulated clock operation in self-comparison mode, and the
chamber sits on an active vibration cancellation stage on an optical equivalent Allan deviation that the simulation predicts if a single clock
table in the same room as the SBS laser setup. The mirrors are separated were run with an interrogation time of 1 ms and dead time of 1.85 ms.
by 77.57 mm, corresponding to a free spectral range of 1.932 GHz. The The simulated clock self-comparison finds Allan deviation of 4.1 × 10−14
cavity finesse is approximately 105 at 674 nm, which results in a linewidth at 1 s, in excellent agreement with our experimental result of
of about 20 kHz. From independent measurements of the 88Sr+ qubit 3.9 × 10−14 / τ. The same model predicts that a single clock interrogated
transition with a laser locked to the ULE cavity, we have determined by the SBS laser would achieve stability of 2.5 × 10−14 in 1 s.
that the length of the cavity varies slowly, resulting in a drift of about
60 mHz s−1 or 1.3 × 10−16 at one second.
Data availability
Limits of the optical clock measurement The datasets that support this study are available from the correspond-
Several factors limit the linewidth with which we are able to interrogate ing author on reasonable request.
the ion. Experiments are performed in a room that is subject to acoustic
noise from four cryogenic vacuum apparatuses in constant operation,
and, while acoustic noise cancellation is performed in the optical fibre Code availability
pathways, there are still sections of free-space optics (approximately The codes used for analysis and simulations are available from the
4 m total path length) that can vibrate and thus write noise onto the corresponding author on reasonable request.
optical signal. Two tapered amplifiers are in the free-space path, which
are both driven into saturation. Acoustic interference can modulate 47. Chang, L. et al. Heterogeneous integration of lithium niobate and silicon nitride
waveguides for wafer-scale photonic integrated circuits on silicon. Opt. Lett. 42,
the coupling into the amplifier chip, which can then appear as phase
803–806 (2017).
modulation on the laser. Our ability to stabilize the magnetic field in the 48. Hjelme, D. R., Mickelson, A. R. & Beausoleil, R. G. Semiconductor laser stabilization by
cryogenic chamber—and thus the frequency of the ion’s clock transi- external optical feedback. IEEE J. Quantum Electron. 27, 352–372 (1991).
49. Akerman, N., Navon, N., Kotler, S., Glickman, Y. & Ozeri, R. Universal gate-set for
tion—also limits the atomic coherence time and effectively increases the
trapped-ion qubits using a narrow linewidth diode laser. New J. Phys. 17, 113060 (2015).
linewidth of the measured transition. As a check to establish a baseline 50. Cummins, H. K., Llewellyn, G. & Jones, J. A. Tackling systematic errors in quantum logic
level of performance given these noise sources, we use the BCS laser gates with composite rotations. Phys. Rev. A 67, 042308 (2003).
as the local oscillator in the same clock protocol described in the main
text (see Extended Data Fig. 5). We find the stability of this clock to be Acknowledgements We thank A. Libson, G. N. West and I. L. Chuang for helpful discussions.
This work was sponsored by the Under Secretary of Defense for Research and Engineering
close to, but slightly better than, the stability of the SBS clock, which
under Air Force contract number FA8721-05-C-0002. Opinions, interpretations, conclusions
suggests that noise in the experiment is not a substantial limitation in and recommendations are those of the authors and are not necessarily endorsed by the US
our ability to evaluate the performance of the SBS as the local oscillator. Government.
Author contributions W.L., J.S. and R.M. conceived, designed and carried out the experiments
Clock protocol and simulation with the SBS laser. W.L., J.S., D.R. and R.M. conceived, designed and carried out the experiments
Our clock protocol consists of a Ramsey measurement, with the initial with the clock protocol. All authors discussed the results and contributed to the manuscript.
π/2 pulse driven by the SBS laser, an interrogation time of 1 ms, and a
final π/2 pulse 90° out of phase with the first pulse. Both pulses are
amplitude corrected using a composite pulse sequence50 to mitigate Additional information
amplitude fluctuations. After completing this Ramsey sequence, we Correspondence and requests for materials should be addressed to W.L.
Peer review information Nature thanks Clément Lacroûte and the other, anonymous,
measure the ion state via state-dependent scattering of photons on reviewer(s) for their contribution to the peer review of this work.
the cycling S1/2 → P1/2 ion transition. After determining which of the two Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | SBS laser noise. Shown is the measurement of the
674-nm SBS laser subsystem’s frequency noise before and after the application
of feedforward stabilization. The SBS laser subsystem features a white noise
floor of 3 Hz 2 Hz−1 before feedforward and a gradual increase in noise at lower
offset frequencies. An integration over the noise spectral density yields a
full-width at half-maximum linewidth of 45 Hz, while a calculation of the
white-noise limited linewidth at large offset frequencies yields 9 Hz. With
feedforward turned on, the SBS laser subsystem’s noise increases slightly and
exhibits additional noise peaks that arise from the RF signal generator used for
feedforward correction. The integrated linewidth increases to 51 Hz.
Extended Data Fig. 2 | Optimization of feedforward stabilization. feedforward correction ratio of 40:1. b, Time series trace of the 1,348-nm
a, Measurement of the differential temperature sensitivity of the SBS SBS laser’s frequency with the linear drift removed and the SBS amplitude
resonator’s two orthogonal-polarization modes. The linewidths of the modes unservoed. The lack of correspondence between the free-running SBS
are measured to be 1.2 MHz (blue arrows). For an applied temperature shift of (red trace) and the polarization beat note (‘Pol. beat ×39’; blue trace) indicates
ΔT = −0.25 °C, the centre frequency at 1,348 nm changes by 490 MHz, while the the inability to cancel frequency drift when amplitude noise is present.
mode separation (black arrows) changes by 11.9 MHz. This corresponds to a
Article
Extended Data Fig. 3 | Linear drift decay of SBS resonator. Record of the
SBS laser’s linear drift at 1,348 nm for two resonators. The first resonator
(red circles) is tracked over 79 days, and its linear drift decreases to a value
of 200 Hz s−1 at the end of the elapsed period of time. The second resonator
(blue squares) is tracked over 260 days and reaches a minimum of 30 Hz s−1.
Extended Data Fig. 4 | Determining and tracking linear drift. a, Rabi transition and sidebands after applying a linear drift correction to null out the
spectroscopy of the |5S1/2, m J = −1/2⟩ → |4D5/2, m J = −3/2⟩ clock transition and the natural drift of the resonator’s frequency. Over the course of 20 min of
first-order motional sidebands at νclock ± ν trap (blue and red sideband, measurements, very little deviation in the centre frequency is observed.
respectively) is taken at regular intervals of approximately 25 s. After c, Linear drift determined from the data presented in b and c. The linear drift
performing fits to the symmetric sidebands, we can average the two can be obtained from a fit (lines) to the apparent frequency of the clock
frequencies to obtain an accurate measure of the frequency of the central transition as a function of time (data points). In the first case, with the large
feature. Only the spectroscopic data for the final experiment (black line) is drift intentionally applied to the laser frequency, we obtain a drift of 5.2 kHz s−1
shown; for all other datasets, the Gaussian peak fits to the sidebands (red and (at 674 nm) from the fit (green line). After a few iterations of applying a
blue curves) are shown with progressively darkening colour to illustrate the correction and measuring the resulting drift, the drift is driven down to 17 Hz s−1
movement of these features over time. For these data, an intentional linear (blue line). d, Integrated clock correction signal applied to the laser to keep the
drift of 5 kHz s−1 (at 674 nm; equivalently 2.5 kHz s−1 at 1,348 nm) was applied to frequency resonant with the atom’s transition. In this case, we use a simplified
demonstrate the efficacy of this method in cases of high drift, as in the initial clock protocol with an interrogation time of τ = 100 μs and no interleaving.
few points shown in Extended Data Fig. 3. b, Rabi spectroscopy of the clock
Article
Extended Data Fig. 5 | BCS laser 88Sr+ ion clock. Measured interleaved clock
performance comprising a BCS laser locked to a 88Sr+ ion operating with 1-ms
interrogation time (blue points). The effective dead time is 4.7 ms. The blue
points represent the frequency noise at a selection of averaging times, and the
vertical blue bars indicate 1σ error. A fit (dashed line) to the data yields a
stability of 3.1 × 10−14/ τ , which is slightly lower than the same clock operated
with a SBS laser.
Extended Data Fig. 6 | Schematic of interleaved clock protocol. A pictorial depending on the number of photons collected, the state of the ion is
representation of the interleaved clock procedure is shown. Here the Doppler determined, and the frequency of the clock is either increased or decreased. As
segments represent the 700-μs duration in which the ion is Doppler cooled. discussed in the text, two separate clock signals, f(1) and f(2), are maintained;
During the OP segments, the ion undergoes 450 μs of optical pumping in order here these are indicated as Clock 1 and Clock 2. While the frequency of either
to prepare the electron in the lower level of the clock transition. The clock is updated, the experiment begins to prepare the state for the next
‘Interrogate’ segments are each 1 ms of interrogation time, bounded by measurement, as indicated by the black arrows. Each of these clocks is sensitive
composite π/2 pulses. Last, the ‘Detect’ segments are 700 μs of detection time, to laser frequency fluctuation only during the 1 ms interrogation period of the
during which the photons emitted by the ion are detected on a photomultiplier total 5.7 ms cycle time; during all other times, the frequency of the laser must
tube and counted by our timing controller. During the ‘Update’ segment, and stay within the capture range of the lock.
Article
Extended Data Fig. 7 | Numerical simulation of clock performance. The

measured performance of the stabilized SBS laser (Allan deviation, red curve)
is used as an input into a clock protocol simulation incorporating projection
noise and dead time. The simulation accurately predicts the measured clock
performance via the interleaved self-comparison (green squares) and predicts
a single-clock stability of 2.5 × 10−14/ τ (blue circles).
Article
Capillary condensation under atomic-scale

confinement
https://doi.org/10.1038/s41586-020-2978-1 Qian Yang1,2 ✉, P. Z. Sun1,2, L. Fumagalli2, Y. V. Stebunov2, S. J. Haigh3, Z. W. Zhou4,

I. V. Grigorieva1,2, F. C. Wang5 ✉ & A. K. Geim1,2 ✉
Received: 29 April 2020

Capillary condensation of water is ubiquitous in nature and technology. It routinely
occurs in granular and porous media, can strongly alter such properties as adhesion,
Check for updates
lubrication, friction and corrosion, and is important in many processes used by
microelectronics, pharmaceutical, food and other industries1–4. The century-old
Kelvin equation5 is frequently used to describe condensation phenomena and has
been shown to hold well for liquid menisci with diameters as small as several
nanometres1–4,6–14. For even smaller capillaries that are involved in condensation
under ambient humidity and so of particular practical interest, the Kelvin equation is
expected to break down because the required confinement becomes comparable to
the size of water molecules1–22. Here we use van der Waals assembly of two-dimensional
crystals to create atomic-scale capillaries and study condensation within them. Our
smallest capillaries are less than four ångströms in height and can accommodate just a
monolayer of water. Surprisingly, even at this scale, we find that the macroscopic
Kelvin equation using the characteristics of bulk water describes the condensation
transition accurately in strongly hydrophilic (mica) capillaries and remains
qualitatively valid for weakly hydrophilic (graphite) ones. We show that this
agreement is fortuitous and can be attributed to elastic deformation of capillary
walls23–25, which suppresses the giant oscillatory behaviour expected from the
commensurability between the atomic-scale capillaries and water molecules20,21. Our
work provides a basis for an improved understanding of capillary effects at the
smallest scale possible, which is important in many realistic situations.
The Kelvin equation predicts that capillaries become spontaneously scale, the Kelvin equation is usually modified to account for ‘wetting
filled with water at the relative humidity films’ that are adsorbed on internal surfaces before the condensation
transition and effectively narrow the capillaries. For the smallest capil-
RHK = exp(−2σ /kBTdρN) (1) laries, the thickness of the wetting films was used as a free parameter. In
the real world, pores, cracks and cavities obviously do not terminate at
where σ ≈ 73 mJ m−2 is the surface tension of water at room tempera- the scale of several nanometres but extend even below 1 nm or 2σ/kBTρN,
ture T, ρN ≈ 3.3 × 1028 m−3 is the number density of water, kB is the Boltz- the fact that makes condensation phenomena omnipresent under ambi-
mann constant and d is the diameter of the meniscus curvature. For a ent conditions. The latter scale is comparable to the diameter of water
two-dimensional (2D) confinement created by parallel walls separated molecules, which makes it challenging to study experimentally because
by a distance h, d = h/cos θ where θ is the contact angle of water on the of difficulties in creating the required atomic-scale confinement1,10,12,
walls’ material. For capillary condensation to occur at relative humid- the varying thickness of wetting films1,2,7–13,17 and huge capillary pres-
ity (RH) considerably below 100%, equation (1) dictates that d must be sures that can cause considerable deformations13,23–25. As for theory,
comparable to 2σ/kBTρN ≈ 1.1 nm. For example, under typical ambient the Kelvin equation is also believed to reach its applicability limit for
RH of 40–50%, water is expected to condense in slits with h < 1.5 nm and confinement containing a few molecular layers because, at this smallest
cylindrical pores with diameters <3 nm, if θ is close to zero. Even stronger scale, the properties of water notably change2,3,12,15,16 and the description
confinement is required for capillaries involving less hydrophilic materi- in terms of homogeneous macroscopic thermodynamics becomes
als. So far, a broad consensus has been reached that the Kelvin equation questionable1–4,16–20, leaving aside the fact that such quantities as d and
remains accurate for menisci with d ≥ 8 nm (refs. 1–4,6–11) and can also θ in equation (1) can no longer be defined1–3,18–20.
describe condensation phenomena in hydrophilic pores as small as The capillary devices that we studied are shown schematically in
4 nm in diameter12–14. To achieve agreement with the experiments at this Fig. 1a. Their most important part is atomically flat 2D channels made
1
National Graphene Institute, University of Manchester, Manchester, UK. 2Department of Physics and Astronomy, University of Manchester, Manchester, UK. 3School of Materials, University of
Manchester, Manchester, UK. 4Key Laboratory of Advanced Technologies of Materials, School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, China. 5Chinese
Academy of Sciences Key Laboratory of Mechanical Behavior and Design of Materials, Department of Modern Mechanics, University of Science and Technology of China, Hefei, China.
✉e-mail: qian.yang-2@manchester.ac.uk; wangfc@ustc.edu.cn; geim@manchester.ac.uk

a AFM chamber b STEM bright field
Top crystal
Four graphene layers
Graphene spacers
20 nm
l
sta
cry
ne
H e
a
m
r
mb
tto
e 0
me
Na
Bo
w
fer
Si N
wa
Water
Si
molecules
Chamber with
controlled humidity 50%
3Å
60%
Sagging depth (Å)

–2
c d 70%
75%
G 80%
h 85%
90%
100 200
Distance (nm)
5Å
–4
500 nm
40 60 80 100
Relative humidity (%)
Fig. 1 | Atomic-scale capillaries and water condensation inside. a, Schematic (dark-to-bright scale, 40 Å). e, Sagging depth δ as a function of RH for a graphite
of the capillary devices studied here. b, Cross-sectional imaging of a four-layer capillary with N = 4. Coloured symbols, AFM measurements. The grey symbol
graphite capillary by scanning transmission electron microscopy (STEM). The with error bars indicates our experimental accuracy. The two solid curves in
top layer was more than 100 nm thick in this case. c, d, AFM imaging of the same orange indicate the constant sagging δ0 below the condensation transition and
mica capillary (N = 11) exposed to 30% and 95% relative humidity, respectively. the ln(RH) dependence above it. The transition is marked by the dashed vertical
In the dry state, the top crystal sagged by ∼5 Å, but it became flat at high RH, as line. Inset, AFM profiles (averaged over 100 nm along the channel) of the top
illustrated in the corresponding schematics above the images. The black crystal for several values of RH. All the AFM measurements were carried out in
dotted lines indicate the edge of the top crystal (compare with a). In the upper the non-contact PeakForce mode (Methods, ‘AFM topography under
part of the AFM images, the colour scale is given by the observed sagging (grey controlled humidity’).
curves). The bottom part shows graphene spacers without the top crystal cover
by van der Waals (vdW) assembly following the fabrication procedures transition’), these two parameters define the stiffness of the top crystal
described in the Methods. In brief, two atomically flat crystals were and, hence, how deeply it bends inwards. We found that, for w ≈ 150 nm,
exfoliated from bulk muscovite mica or graphite to become the top and the top crystal should be ∼50–70 nm thick to exhibit a sagging depth δ
bottom walls of our capillaries. Separately, narrow strips of multilayer of several ångströms. If either w or H were changed only by a factor of 2,
graphene were fabricated to serve as spacers between the two mica the strong dependence δ ∝ w4/H3 resulted in either collapsed channels
or graphite crystals. Stacking the crystals and spacers on top of each (the top crystal attached to the bottom one) or such a small δ (<1 Å)
other resulted in the 2D channels shown in Fig. 1 and Extended Data that the condensation transition was impossible to discern by AFM.
Fig. 1. We used graphene spacers between N = 2 and about 10 layers The capillaries studied here were typically 5–10 μm long.
thick so that the capillaries had the designated height Na (see Fig. 1a), As shown in Fig. 1a and Extended Data Fig. 1c, our capillary devices
where a ≈ 3.35 Å is the effective thickness of monolayer graphene26,27. were assembled on top of a silicon nitride membrane. It had a rectangu-
Examples of transmission electron microscopy imaging of our capillar- lar opening that was extended into the bottom crystal by dry etching.
ies are provided in Fig. 1b and Extended Data Fig. 1d. Mica and graphite The Si chip supporting the entire assembly was used to separate two
were chosen as archetypal strongly and weakly hydrophilic materials. miniature gas chambers that were integrated into an AFM set-up as
Their contact angles are known to be in the range of 0–20° and 55–85°, shown in Extended Data Fig. 2a. The bottom chamber provided variable
respectively16,28,29. For surfaces exposed to air, θ is close to the above humidity so that one entrance of the 2D capillaries was exposed to a
upper bounds28,29 (Methods). chosen RH. The opposite entrance was facing the top chamber, which
To detect RH at which capillary condensation occurred in the 2D cap- enclosed the AFM scanning head and was usually kept at low humidity.
illaries described above, we exploited the fact26,30 that the suspended The two-chamber configuration allowed us to avoid the influence of
thin crystals exhibited noticeable sagging caused by their vdW adhe- RH on measurements of the top crystal’s topography (for example, no
sion to sidewalls (Fig. 1c). In our experiments we found that, when the condensation occurred at the AFM tip during scanning)31. Examples of
capillaries became filled with water, the sagging depth δ diminished AFM imaging for mica and graphite devices are given in Fig. 1c, d and
(Fig. 1d), presumably because intercalating water molecules ‘screen’ the in Extended Data Fig. 3, respectively. They reveal pronounced sagging
adhesion27,30. To make the resulting changes in δ detectable by atomic under dry conditions, which disappeared in high humidity. Typical
force microscopy (AFM), it was important to choose the thickness H evolution of the top crystal’s profiles with changing RH is shown in
of the top crystal and the channel width w carefully (see Fig. 1a). As Fig. 1e and Extended Data Figs. 4 and 5. In these measurements, we
described in Methods (‘Remnant sagging above the condensation increased RH inside the bottom chamber in steps of 5%, waited for an

Article
a b
100 100
1 2 3 4
50
80 0 80 T = 85°
Relative humidity RHC and RHK (%)
Relative humidity RHC and RHK (%)

–50 T = 75°
Energy
(mJ m–2)
60 60
T = 20° T = 0°
40 40
20 20
Mica channels Graphite channels
0 0
0 5 10 15 20 25 30 35 0 5 10 15 20
Channel height (Å) Channel height (Å)
Fig. 2 | Condensation transition under extreme 2D confinement. a, Relative minima that correspond to the integer number of water monolayers that can fit
humidity RHC required for water condensation in mica channels of different inside the 2D capillaries. Red symbols (connected by the dashed curve) are the
heights h. Blue circles indicate experimental observations, their size reflects expected behaviour calculated using the oscillating γSL shown in the upper
the 3.5% experimental uncertainty in determining RHC (Methods, ‘AFM curve and equation (2). Black dashed curve, same analysis but assuming fully
topography under controlled humidity’). Two solid curves indicate RHK given flexible capillary walls allowing relaxation into the energy minima at
by equation (1) with bulk water’s characteristics for the range of possible θ for commensurate h. Green filled circles, same analysis but for a finite rigidity of
mica (colour-coded). The upper curve (open black circles), with its own y axis the confining walls. b, Same as a, but for graphite capillaries. The simulated
and the common x axis, shows our MD calculations for changes in γSL caused by curves are for θ ≈ 85°.
restructuring of water inside 2D channels (θ ≈ 10°). The arrows mark the energy
hour for the system to stabilize and then recorded AFM images. The sagging’). Accordingly, to account for the effect of different δ0, Fig. 2
temperature was kept at 294 ± 1 K. For the device in Fig. 1e, the sagging plots RHC as a function of h rather than of N. For mica capillaries, the
remained practically constant for RH ≤ 75% and then exhibited a pro- experimental data are well described by equation (1) using θ and σ of
nounced jump at RHC, which we attribute to the condensation transition bulk water. Because RHK(h) depends little on the exact value of θ for
(another example is shown in Extended Data Fig. 5). Further increase strongly hydrophilic capillaries (Fig. 2a), the comparison of RHC for
in RH led to a gradual decrease in δ such that the top crystal became mica with equation (1) is straightforward. This is not the case for weakly
practically flat at RH > 95% (Fig. 1). The remnant sagging at RH > RHC is hydrophilic graphite, for which relatively small variations in θ lead to
well described by the negative capillary pressure which keeps the top considerable changes in RHK(h) as per equation (1). Nonetheless, the
crystal bent inwards even after water has filled the 2D channels, sup- values of RHC observed for our graphite capillaries fall well within the
pressing the adhesion of the top crystal to the sidewalls. Indeed, for range expected from the Kelvin equation using the contact angles
RH > RHC, δ is expected to evolve proportionally to ln(RH) and reach θ = 80 ± 5°, typical for graphite surfaces under ambient conditions29.
zero at 100% humidity23,24, in agreement with the observed behaviour It is surprising that the macroscopic Kevin equation using the char-
in Fig. 1e and Extended Data Fig. 6 (Methods, ‘Remnant sagging above acteristics of bulk water describes condensation in our mica capillar-
the condensation transition’). If we repeated the same measurements ies so well and also provides qualitative agreement for the graphite
but with decreasing RH, a reverse jump occurred at the same RHC, that capillaries. As mentioned in the introduction, strong discrepancy is
is, the condensation transition was non-hysteretic (Extended Data expected for the ångström-scale confinement where only one or two
Fig. 4a; Methods, ‘Non-hysteretic behaviour of the condensation transi- layers of water fit inside capillaries. Before trying to explain the unex-
tion’). Note, however, that it could take up to several days for capillaries pected agreement between the experiment and the macroscopic Kelvin
exposed to high RH to dry out completely and return to their original equation, we note that RHC values in Fig. 2a are notably lower than the
state (Extended Data Fig. 4b). On the other hand, for measurements RH values required to achieve condensation in the previous studies
with increasing RH, no difference in RHC was observed after either an for d ≥ 8 nm. At our low RH, no continuous wetting layer is expected
hour or days of equilibration. Accordingly, our experiments were nor- even on fresh mica surfaces12,33, and a partial coverage by monolayer
mally carried out with increasing rather than decreasing RH, as in Fig. 1e. water is probably suppressed further by adsorbates from air, which
Figure 2 summarizes our results for the condensation transitions are responsible for the relatively large θ close to 20°. The same con-
observed in mica and graphite 2D capillaries. To allow more accurate sideration about the apparent absence of wetting films also applies
comparison between data collected from different devices, we have for the graphite capillaries in which the wetting transition is even less
accounted for the fact that capillaries with the same N often exhibited likely1,2,28. Second, to avoid the macroscopic variables σ and θ that are
different sagging in their dry state, δ0. For capillaries with large δ0, poorly defined under our extreme confinement, the Kelvin equation
we observed consistently lower RHC than for those with small initial can be rewritten as1,2,18
sagging and same N. Moreover, comparing capillaries with different
N but similar channel heights h = Na − δ0, we found close values of RHC RHK = exp[−2(γSV − γSL)/hkBTρN] (2)
(Extended Data Fig. 5). This implies that it was the narrowest, central
region of the 2D channels that determined the onset of condensation, where γSV and γSL are the surface energies for solid–vapour and solid–
in agreement with general expectations32 (Methods, ‘Effect of initial liquid interfaces, respectively, and γSV − γSL = σ cos θ. The energy γSV

is largely independent of h because the interaction of gas molecules
with surfaces should depend little on confinement. Also, ρN changes Online content
relatively little for nearly incompressible water20,34. Therefore, the Any methods, additional references, Nature Research reporting sum-
dominant effect of extreme confinement is likely to come from γSL(h), maries, source data, extended data, supplementary information,
which is governed by vdW interactions of liquid water with solid sur- acknowledgements, peer review information; details of author con-
faces. Because these interactions are short-range, it is predominantly tributions and competing interests; and statements of data and code
the first near-surface layer of water that determines γSL. If this layer availability are available at https://doi.org/10.1038/s41586-020-2978-1.
changes little under confinement, then Δγ = γSL(h) – γSL(∞) ≈ 0, and
capillary condensation should closely follow equation (1) even at the 1. Charlaix, E. & Ciccotti, M. Capillary Condensation in Confined Media (CRC 2010).
nanoscale1,2,18. Substantial changes in γSL and, hence, RHC are expected 2. van Honschoten, J. W., Brunets, N. & Tas, N. R. Capillarity at the nanoscale. Chem. Soc.
only in the limit of few-layer water where its near-surface structure is Rev. 39, 1096–1114 (2010).
3. Malijevský, A. & Jackson, G. A perspective on the interfacial properties of nanoscopic
notably altered20,34 (Extended Data Fig. 7). liquid drops. J. Phys. Condens. Matter 24, 464121 (2012).
For further analysis, we used molecular dynamics (MD) simulations 4. Barsotti, E., Tan, S. P., Saraji, S., Piri, M. & Chen, J.-H. A review on capillary condensation in
nanoporous media: implications for hydrocarbon recovery from tight reservoirs. Fuel 184,
(Methods) to evaluate Δγ and the resulting corrections to the macro-
344–361 (2016).
scopic Kelvin equation, which are given by the factor exp(2Δγ/hkBTρN) 5. Thomson, W. On the equilibrium of vapour at a curved surface of liquid. Proc. R. Soc.
according to equation (2). Examples of the calculated Δγ(h) are shown Edinb. 7, 63–68 (1872).
6. Aukett, P. N., Quirke, N., Riddiford, S. & Tennison, S. R. Methane adsorption on
in Fig. 2a and Extended Data Fig. 8. There are pronounced commensu-
microporous carbons—a comparison of experiment, theory, and simulation. Carbon 30,
rability oscillations20,21,34 in γSL(h) such that energy minima appear if 2D 913–924 (1992).
channels accommodate exactly one, two, three or four molecular layers 7. Fisher, L. R., Gamble, R. A. & Middlehurst, J. The Kelvin equation and the capillary
condensation of water. Nature 290, 575–576 (1981).
of water. The oscillations practically disappear for h > 15 Å where Δγ
8. Kohonen, M. M. & Christenson, H. K. Capillary condensation of water between rinsed
becomes almost zero, which also implies that the macroscopic Kelvin mica surfaces. Langmuir 16, 7285–7288 (2000).
equation should be valid in this regime. For smaller h, changes in γSL 9. Mitropoulos, A. Ch. The Kelvin equation. J. Colloid Interface Sci. 317, 643–648 (2008).
10. Zhong, J. et al. Capillary condensation in 8 nm deep channels. J. Phys. Chem. Lett. 9,
are comparable to σ, which means that the above correction factor is
497–503 (2018).
comparable to RHK itself. Consequently, the simulated RHC(h) depend- 11. Yang, G., Chai, D., Fan, Z. & Li, X. Capillary condensation of single- and multicomponent
ences shown in Fig. 2 (red dotted curves and symbols) exhibit giant fluids in nanopores. Ind. Eng. Chem. Res. 58, 19302–19315 (2019).
12. Kim, S., Kim, D., Kim, J., An, S. & Jhe, W. Direct evidence for curvature-dependent surface
oscillations such that, for incommensurate h, water condensation
tension in capillary condensation: Kelvin equation at molecular scale. Phys. Rev. X 8,
becomes unfavourable and should not occur even at 100% humidity. 041046 (2018).
No such oscillatory behaviour could be detected in our experiments. 13. Gruener, S., Hofmann, T., Wallacher, D., Kityk, A. V. & Huber, P. Capillary rise of water in
hydrophilic nanopores. Phys. Rev. E 79, 067301 (2009).
We attribute its absence to elastic adjustment such that 2D channels 14. Vincent, O., Marguet, B. & Stroock, A. D. Imbibition triggered by capillary condensation in
tend to accommodate an integer number of molecular layers of water20. nanopores. Langmuir 33, 1655–1661 (2017).
Indeed, the energy minimization should be applied to the entire system, 15. Shin, D., Hwang, J. & Jhe, W. Ice-VII-like molecular structure of ambient water
nanomeniscus. Nat. Commun. 10, 286 (2019).
including the elastic energy of confining walls23–25. For an extremely 16. Verdaguer, A., Sacha, G. M., Bluhm, H. & Salmeron, M. Molecular structure of water at
soft confinement, 2D channels would adjust their h to reach the com- interfaces: wetting at the nanometer scale. Chem. Rev. 106, 1478–1510 (2006).
mensurate states at minima of Δγ. The condensation behaviour in this 17. Matsuoka, H., Fukui, S. & Kato, T. Nanomeniscus forces in undersaturated
vapors: observable limit of macroscopic characteristics. Langmuir 18, 6796–6801 (2002).
case should follow the step-like black dashed curves shown in Fig. 2. 18. Powles, J. G. On the validity of the Kelvin equation. J. Phys. Math. Gen. 18, 1551–1560 (1985).
A finite rigidity pushes the equilibrium conditions away from the 19. Walton, J. P. R. B. & Quirke, N. Capillary condensation: a molecular simulation study. Mol.
commensurability minima. To estimate a likely elastic response of Simul. 2, 361–391 (1989).
20. Cheng, S. & Robbins, M. O. Nanocapillary adhesion between parallel plates. Langmuir 32,
our 2D channels, note that the capillary pressure above RHC, which is 7788–7795 (2016).
defined by σ, keeps the top crystal bent inwards typically by several 21. Knežević, M. & Stark, H. Capillary condensation in an active bath. EPL 128, 40008 (2019).
ångströms (Fig. 1e; Extended Data Figs. 4–6). Similar elastic adjust- 22. Sing, K. S. W. & Williams, R. T. Historical aspects of capillarity and capillary condensation.
Microporous Mesoporous Mater. 154, 16–18 (2012).
ments can be expected in our capillaries because changes in Δγ are 23. Schoen, M. & Günther, G. Phase transitions in nanoconfined fluids: synergistic coupling
comparable to the absolute value of σ. Accordingly, our confinement between soft and hard matter. Soft Matter 6, 5832–5838 (2010).
should be considered as rather soft. To illustrate the likely conden- 24. Gor, G. Y., Huber, P. & Bernstein, N. Adsorption-induced deformation of nanoporous
materials—a review. Appl. Phys. Rev. 4, 011303 (2017).
sation behaviour in such a case, the green curves in Fig. 2 show the 25. Altabet, Y. E., Haji-Akbari, A. & Debenedetti, P. G. Effect of material flexibility on the
RHC(h) dependences expected if the walls’ finite rigidity allows their thermodynamics and kinetics of hydrophobically induced evaporation of water. Proc.
deformations to reach within 0.5 Å of the commensurability minima Natl Acad. Sci. USA 114, E2548–E2555 (2017).
26. Radha, B. et al. Molecular transport through capillaries made with atomic scale precision.
in Δγ. The latter curves are in good agreement with the experiment Nature 538, 222–225 (2016).
and, in the case of graphite capillaries, also exhibit the same trend 27. Gopinadhan, K. et al. Complete steric exclusion of ions and proton transport through
towards lower RH for h < 10 Å as observed experimentally in Fig. 2b. In confined monolayer water. Science 363, 145–148 (2019).
28. Drelich, J., Chibowski, E., Meng, D. D. & Terpilowski, K. Hydrophilic and superhydrophilic
principle, the elastic response of 2D confinement could be included in surfaces and materials. Soft Matter 7, 9804–9828 (2011).
the simulations self-consistently, but the scatter in the experimental 29. Mücksch, C., Rösch, C., Müller-Renno, C., Ziegler, C. & Urbassek, H. M. Consequences of
data and different H used for different capillary devices make this effort hydrocarbon contamination for wettability and protein adsorption on graphite surfaces.
J. Phys. Chem. C 119, 12496–12501 (2015).
beyond the rationale of the present study. 30. Fumagalli, L. et al. Anomalously low dielectric constant of confined water. Science 360,
Finally, we note that elastic adjustments should also play an impor- 1339–1342 (2018).
31. Weeks, B. L. & Vaughn, M. W. Direct imaging of meniscus formation in atomic force
tant role in real-life capillaries responsible for condensation phe-
microscopy using environmental scanning electron microscopy. Langmuir 21,
nomena under ambient humidity. Indeed, capillary pressures at 1-nm 8096–8098 (2005).
scale typically exceed 1 kbar, and the resulting elastic response of even 32. Malijevský, A. & Parry, A. O. Modified Kelvin equations for capillary condensation in
narrow and wide grooves. Phys. Rev. Lett. 120, 135701 (2018).
infinitely thick walls can exceed 1 Å for the case of 2D confinement
33. Christenson, H. K. & Thomson, N. H. The nature of the air-cleaved mica surface. Surf. Sci.
(Methods, ‘Remnant sagging above the condensation transition’). This Rep. 71, 367–390 (2016).
should force atomic-scale capillaries to elastically adjust their geom- 34. Neek-Amal, M., Peeters, F. M., Grigorieva, I. V. & Geim, A. K. Commensurability effects in
viscosity of nanoconfined water. ACS Nano 10, 3685–3692 (2016).
etry13,23,24, suppressing commensurability oscillations and resulting in
the condensation transition at RH close to the values prescribed for a
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
soft confinement. Accordingly, capillary condensation under ambient published maps and institutional affiliations.
conditions can be expected to qualitatively follow the macroscopic
Kelvin equation, as happened for the reported capillaries. © Crown 2020

Article
Methods software35. Selected capillaries were imaged at regular RH intervals of
5%, after stabilizing humidity in the bottom chamber for approximately
Fabrication procedures an hour. As an example of our AFM measurements, Extended Data
The capillary devices studied here were fabricated following the pro- Fig. 3 shows three critical steps in the evolution of the topography for
cedures described in refs. 26,27 and shown in the flow chart of Extended an N = 3 graphite capillary. The initial sagging in this case was ∼4 Å as
Data Fig. 1a. First, a large crystal of multilayer graphene (number of seen for RH = 55% in Extended Data Fig. 3a. With RH increasing in 5%
layers N) was prepared on an oxidized Si wafer by mechanical exfolia- steps, no change in the sagging profile was observed until we reached
tion. Using electron-beam lithography and oxygen plasma etching, we 70% RH. At the latter humidity, the top crystal was found to sag notably
patterned the crystal into a set of parallel narrow strips that had a width less (Extended Data Fig. 3b). On increasing RH further, the top crystal
of ∼150 nm and were separated by approximately the same distance gradually lifted and became practically flat at 95% RH (Extended Data
(Extended Data Fig. 1b). These spacers were then put on top of a mica Fig. 3c). The flattening process is described in detail below. Because the
or graphite crystal of typically 20 nm in thickness. The latter crystal rapid change in sagging happened somewhere between 65% and 70%
was prepared on a separate Si wafer and, in this work, is referred to as RH, we assigned the condensation transition to RHC = 67.5 ± 2.5% with
the bottom crystal (Extended Data Fig. 1a, stage 1). an additional error of ±1% because of the humidity sensor’s accuracy
In parallel, we prepared a suspended silicon nitride (SiN) membrane (as indicated by the symbol size in Fig. 2).
with a rectangular hole in the centre (Extended Data Fig. 1a, stage 2). To The use of low humidity in the top chamber notably improved the
this end, we used commercial Si wafers with 500 nm of SiN deposited stability of AFM imaging but was not essential. Indeed, if we simply con-
on both sides. Using photolithography and reactive ion etching (RIE), nected the top and bottom chambers so that both sides of the studied
we made a window of about 750 × 750 μm2 in size in one of the SiN lay- capillaries were exposed to the same RH, the condensation transition
ers. The wafer was then placed in hot KOH to etch through the entire was found to occur at the same RHC as in the case of low RH in the top
Si thickness and obtain a freestanding SiN membrane of ∼70 × 70 μm2 chamber. This shows that the condensation transition is determined
in size. After that, a rectangular hole (∼3 × 20 μm2) was plasma-etched by the RH at the entrance side (that is, the highest RH) (Extended Data
in the SiN membrane by means of another round of photolithography Fig. 2b). This observation is consistent with the fact that no difference
and RIE (Extended Data Fig. 1a, stage 2). in sagging was observed along the entire length of the capillaries, even
The two-layer assembly consisting of the bottom crystal and when their exits were at low RH (Fig. 1d; Extended Data Fig. 3b), which
graphene spacers (Extended Data Fig. 1a, stage 1) was transferred on top indicates no detectable gradient in negative capillary pressure along
of the SiN membrane in such a way that graphene strips were aligned the 2D channels. The constant capillary pressure can be attributed to
perpendicular to the long edge of the rectangular hole. This step was a very fast flow of liquid water through our atomically flat capillaries26,
followed by RIE from the backside of the Si wafer to extend the hole which allows essentially the same meniscus curvature at both entrance
through the bottom crystal (Extended Data Fig. 1a, stage 3). Finally, and exit sides, as shown schematically in Extended Data Fig. 2b. If the
another mica (or graphite) crystal was placed on top of the two-layer top chamber is kept at low humidity, such an equilibrium state is stabi-
assembly to form 2D channels (Extended Data Fig. 1a stage 4, Extended lized by a slightly retracted exit meniscus and slow Knudsen diffusion
Data Fig. 1c). of water vapour near the capillary exit26, which provide the required
After each crystal transfer, samples were cleaned in acetone, deion- RH gradient. This is different from the case of nanoporous media with
ized water and isopropanol. This was followed by annealing at 400 °C rough internal surfaces and tortuous capillaries, where both liquid and
in a hydrogen–argon atmosphere for 3 h. Such thorough cleaning was vapour transport are slow, allowing large pressure gradients to build
essential to remove polymer residues and other possible contamination, up along the liquid flow direction14.
which could otherwise block the capillaries. In our experiments, we
used only those capillaries that exhibited uniform sagging along their Non-hysteretic behaviour of the condensation transition
entire length, such as those shown in Fig. 1c and Extended Data Fig. 3. Capillary condensation in nanochannels is often accompanied by
The contact angle for muscovite mica and natural graphite used for hysteresis such that RHC required to reach the transition depends on
making our devices was measured by Drop Shape Analyzer 100S (Krüss). whether external RH is increased or decreased1,4,11,13,23,24,36. This was not
We found θ ≈ 80–85° for graphite and 15–20° for mica after exposure the case for our capillaries, which exhibited no hysteresis within our
to ambient air for a few days, in agreement with previous reports (see, experimental accuracy. This behaviour is illustrated in Extended Data
for example, refs. 28,29). Fig. 4a, which shows the profiles of the top crystal for a four-layer graph-
ite capillary where RH was changed in a small loop around the transition
AFM topography under controlled humidity observed at RHC = 77.5 ± 2.5%. The capillary’s sagging was constant for
Our set-up for AFM measurements is shown in Extended Data Fig. 2a. RH ≤ 75%. Then we increased RH to 80% and equilibrated for 1 h, fol-
The SiN wafer containing a capillary device such as that shown in lowing the experimental procedures described above. The 5% change
Extended Data Fig. 1c was placed to seal an airtight metal chamber in RH led to a pronounced jump in the sagging depth δ, indicating the
with a volume of about 1 cm3. A continuous flow of nitrogen gas into this condensation transition (compare black and red curves in Extended
chamber was provided through miniature inlets. The humidity was con- Data Fig. 4a). The capillary profile remained stable while RH was main-
trolled by mixing dry nitrogen with nitrogen bubbled through deion- tained at 80%. When we decreased RH back to 75%, the capillary did
ized water, using a computer-controlled flowmeter. RH was measured not return to the initial dry state after 4 h (blue curve). Nonetheless,
by a humidity sensor (Sensirion), which was mounted inside the bottom the top crystal continued to sag gradually with time (Extended Data
chamber. The sensor was calibrated with three different saturated salt Fig. 4a). The dry state was eventually reached (after more than 9 h but
solutions (lithium chloride, magnesium nitride and potassium chloride) less than 16 h). Therefore, the condensation transition occurred at the
to ensure readings of RH with an accuracy of ±1%. A commercial silicone same RHC with either increasing or decreasing humidity, although long
rubber enclosure (Bruker) was attached to the AFM head (Extended equilibration times were needed for 2D channels to dry up.
Data Fig. 2a). When it was lowered for taking AFM images, the enclosure The slow recovery of the initial dry state is further exemplified by
edges sealed the space above the devices being studied. Fresh silica Extended Data Fig. 4b. It shows the case of a graphite capillary with N = 6,
gel granules were usually placed inside the enclosure to provide low where the condensation transition was found to occur between 60%
humidity in the top chamber. and 65% RH. In Extended Data Fig. 4b, we first increased RH from 60%
All AFM images in our experiments were taken in the PeakForce directly to 95%, well above the transition (black and red curves, respec-
mode using Dimension FastScan (Bruker) and analysed with WSxM tively). Then RH was reduced to ∼30%, well below RHC = 62.5 ± 2.5%.
As seen in Extended Data Fig. 4b, the top crystal regained its original This explanation is supported by MD simulations27. They showed that
profile very slowly, and the capillary returned to its dry state only after capillaries with monolayer spacers (N = 1) always collapsed (indepen-
several days. After this the sagging remained stable. The reason for dently of H) because of vdW interaction between the top and bottom
such a slow drying process remains to be understood. crystals. The collapsed capillaries could be opened by intercalating
It is also worth mentioning that our capillary devices did not show water, because the attraction rapidly diminishes at distances more than
any discernible change in sagging with changing RH below the con- a few ångströms so that even a monolayer of water provided sufficient
densation transition, as seen, for example, in Fig. 1e. This is in contrast ‘screening’ of the vdW attraction26,27.
to the usual elasto-capillary response of nanoporous media, in which In all our measurements, the sagging depth δ0 became abruptly
adsorption of water molecules on internal surfaces leads to notable smaller at the condensation transition but did not completely dis-
strain, usually referred to as the Bangham effect37. Its apparent absence appear. Only if RH was increased further did the remnant sagging
in our experiment is perhaps not surprising. First, as discussed in the gradually decrease, approaching zero at 100% RH, so that the top layer
main text, we expect only a small partial coverage of internal walls by became essentially flat (Fig. 1d, e; Extended Data Fig. 3). The remnant
adsorbed water molecules before the transition. Second, if there were sagging at the transition and its gradual changes with further increases
notable adsorption, the adsorption-induced strain is typically of the in RH can be explained by the negative pressure P caused by the con-
order of 10−4 for materials with high Young’s moduli13,23,24,36. Therefore, densed water meniscus1,40. Let us consider our typical mica capillary
for our top crystals with H and w of ∼100 nm, this strain would translate with w ≈ 150 nm and top-layer thickness H ≈ 50–60 nm (Extended Data
into sagging of the order of 0.1 Å, below our experimental accuracy. Fig. 6a). After the condensation transition occurred at a certain RH
(which depended on channel height h), the top layer remained sagged
Effect of initial sagging typically by several ångströms. At the condensation transition, the
For 2D channels with the same N, their heights h = Na − δ0 could vary capillary pressure is given by P ≈ 2σ cos θ/h. Using the contact angle
considerably because of different H and slightly different w, which θ ≈ 20° for mica and the surface tension of bulk water, σ ≈ 73 mN m−1,
control the initial sagging δ0. This resulted in different RHC observed the Young–Laplace equation yields P ≈ 700 bar for h ≈ 2 nm.
for capillaries with the same N. This behaviour is illustrated in Extended The negative pressure P forces the top crystal to bend downwards,
Data Fig. 5, which shows the condensation transition in two capillaries resulting in its sagging given by41
with N = 5 but different δ0. The capillary in Extended Data Fig. 5a had
a top layer with H ≈ 70 nm and exhibited initial sagging of ∼4 Å. The 5Pw 4
δ= , (3)
transition in this device occurred at RHC = 82.5%. The other capillary 32EH 3
(Extended Data Fig. 5b) had a thinner (∼45 nm) top crystal and, accord-
ingly, its δ0 was larger (∼8.5 Å). The latter device exhibited RHC = 72.5%, where E ≈ 60 GPa is Young’s modulus of mica in the out-of-plane direc-
considerably lower than that in Extended Data Fig. 5a. This implies that tion42. Equation (3) yields δ ≈ 4−7 Å, in agreement with our observa-
N was not the characteristic determining the onset of water condensa- tions. This substantiates our model that the remnant sagging above the
tion. The importance of h rather than N is even better exemplified by the condensation transition is caused by the negative capillary pressure.
results of Extended Data Fig. 4: the capillary with N = 4 in Extended Data Similar agreement is found for graphite capillaries, although there is
Fig. 4a exhibited the condensation transition at 77.5% RH, whereas the a larger uncertainty in the estimates (a factor of 2) because P strongly
nominally larger capillary with N = 6 in Extended Data Fig. 4b showed the depends on θ for the large contact angles exhibited by water on graph-
transition at lower RHC = 62.5%. This obviously contradicts the expecta- ite. As RH is increased beyond the condensation point, the meniscus
tion that the smaller-N capillary should exhibit lower RHC. However, extends outside capillaries, and its curvature becomes progressively
because of different δ0, the smaller (N = 4) capillary in Extended Data smaller to match the external RH. Accordingly, the negative capillary
Fig. 4a had h ≈ 7 Å, which was larger than h ≈ 3.5 Å for the larger (N = 6) pressure above RHC evolves with RH and is given by the other Kelvin
capillary in Extended Data Fig. 4b. The smaller RHC for 2D channels equation as1,23,24,36,40
with smaller h agrees with the general expectations.
The above observations strongly suggest that h is the size parameter P = kBTρNln RH. (4)
that best describes the condensation transition in our atomic-scale cap-
illaries. This means that it is the central region between the sagged-top According to this equation, the pressure that bends the top crystal
and flat-bottom crystals where the condensation process is effectively should decrease logarithmically with RH, in good agreement with our
initiated. This is not entirely unexpected because MD simulations have observations (Extended Data Fig. 6b). Note that, close to 100% RH,
previously shown that corner menisci in narrow channels were unfa- δ ∝ P ∝ ln(RH) ≈ (RH – 1) is expected to approach zero linearly, as indeed
vourable for condensation32. Furthermore, note that the mean free observed in Fig. 1e and Extended Data Fig. 6b.
path of water molecules in air is about 65 nm, which is comparable The condition of partially sagged but open capillaries (that is, few
to the channel width w ≈ 150 nm. This implies that the entire chan- ångströms < δ0 < Na, as in our devices) is rather difficult to satisfy experi-
nel should present a single entity from the standpoint of thermody- mentally. Indeed, if we were to decrease H or increase w by only a factor
namics, allowing only one condensation transition over the channel’s of 2 with respect to the optimal design found, δ in equation (3) would
cross-section. To this end, it is important to note that although our increase by an order of magnitude because of the high powers. On the
capillaries contained only a few monolayers of water, there were still mil- other hand, the capillary pressure P in equation (4) depends on RH
lions of molecules inside each capillary, which should be sufficient for only logarithmically, which means that even at very low humidity (for
the thermodynamic description, unlike in the case of nanometre-scale example, 5%), it would be thermodynamically favourable for the top
droplets containing a small number of molecules3,16,18. layer with the non-optimal w or H to bend all the way down and reach
the channel’s bottom. Therefore, such non-optimized 2D channels
Remnant sagging above the condensation transition are unstable with respect to spontaneous water condensation under
Nanometre-thick suspended crystals are known to exhibit considerable low-humidity conditions. If we were to do the opposite and increase
sagging, which is believed to be caused by their vdW attraction to side- the top crystal’s stiffness (by halving w or doubling H), δ in equation (3)
walls26,30,38,39. As water spontaneously condensed inside the 2D channels, becomes so small (<1 Å) that changes in the sagging would be impos-
the sagging was found to decrease suddenly (see the jump-like changes sible to detect by AFM. The above consideration shows that there is a
in Fig. 1e and in Extended Data Figs. 4a and 5), which we attribute to subtle interplay between materials parameters and the design of 2D
suppression of the vdW attraction by intercalating water molecules. channels, and stringent rules should be followed in order to detect the
Article
condensation transition in experiment. Following this insight, we usu- imposed in all three directions. All the simulations were carried out with
ally increased H by ∼50% for our smallest 2D channels with N = 2 and 3, the isothermal–isobaric ensemble at 298 K. The density profiles found
which ensured that they remained open. Also, when making graphite in our simulations are shown in Extended Data Fig. 7. The confined
capillaries, we used top crystals slightly (∼20%) thicker than in the case water exhibits a pronounced layered structure that extends over two
of mica capillaries with the same N because mica has a higher Young’s intermolecular distances from each surface, before the water density
modulus than graphite43. converges to its bulk value, in agreement with the earlier literature
For nanoscale 2D capillaries such as cracks or slits inside bulk materi- (see, for example, refs. 19,20,48,49).
als (H → ∞), their elastic deformations caused by large capillary pres- The deviations Δγ in the solid–liquid surface energy γSL from its bulk
sures can notably shift the condensation transition with respect to that value may be considered as extra work spent to rearrange water mol-
expected for the rigid confinement20,21. To estimate the magnitude ecules into the strongly layered structures shown in Extended Data Fig. 7.
of such adjustments, let us consider the deformation of a half-space If h is sufficiently large, the extra work is negligible because the opposite
elastic medium subject to the uniform load p over a suspended strip surfaces do not ‘feel’ each other, and their near-surface water structures
with the width w = 2a in the range of −a ≤ x ≤ a. The vertical deforma- remain unchanged with respect to the case of infinite h. However, as the
tion is given by44 walls are getting closer, the layered structures overlap (see the density
profiles for h ≱ 10 Å in Extended Data Fig. 7). As a result, the total energy
(1 − ν)p  x x 
uz ( x ) = − a  + 1ln|x + a| −  − 1ln|x − a| − 2, (5) and, hence, Δγ exhibit pronounced oscillations (Extended Data Fig. 8).
πG  a  a   Using equation (2) and the numerically found Δγ, it is straightforward
to calculate the RH required for water condensation inside atomic-scale
where ν is Poisson’s ratio and G is the shear modulus. This equation capillaries. The results are plotted in Fig. 2 of the main text and reveal
yields the sagging giant oscillations in RHC which emerge when the structured layers of
water near the two confining surfaces start overlapping. Note that the
ln(2)(1 − ν)pw confining walls in the MD simulations were made rigid, disallowing
δ = uz (0) − uz ( ± a) = . (6)
πG elastic deformations considered separately in our analysis in Fig. 2.
If we take as an example the elastic properties of graphite with

G ≈ 10 GPa and ν ≈ 0.343, equation (6) yields δ ≈ 2.3 Å for capillary pres- Data availability
sures of about 1,000 bar. Such P are typical for cavities of 1–2 nm in All the mentioned data to support this study and its conclusions are
height (see above). This indicates that elastic deformations can not available upon request from Q.Y.
only be a contributing factor during the condensation transition23,24,36
but also allow atomic-scale cavities in bulk materials to adjust their size 35. Horcas, I. et al. WSXM: a software for scanning probe microscopy and a tool for
nanotechnology. Rev. Sci. Instrum. 78, 013705 (2007).
so that an integer number of water layers can fit inside, similar to our 36. Gor, G. Y. et al. Elastic response of mesoporous silicon to capillary pressures in the pores.
case where the top crystal was intentionally made sufficiently flexible. Appl. Phys. Lett. 106, 261901 (2015).
37. Bangham, D. H. & Fakhoury, N. The expansion of charcoal accompanying sorption of
gases and vapours. Nature 122, 681–682 (1928).
Molecular dynamics simulations of water–surface interaction 38. Li, T. & Zhang, Z. Substrate-regulated morphology of graphene. J. Phys. D 43, 075303 (2010).
under strong confinement 39. Scharfenberg, S., Mansukhani, N., Chialvo, C., Weaver, R. L. & Mason, N. Observation of a
snap-through instability in graphene. Appl. Phys. Lett. 100, 021910 (2012).
To investigate the dependence of the solid–liquid surface energy γSL on
40. Israelachvili, J. N. Intermolecular and Surface Forces (Academic, 2011).
h, MD simulations were performed using LAMMPS simulation code45 41. Hibbeler, R. C. Mechanics of Materials 814 (Pearson, 2015).
and the SPC/E model for water molecules46. The interaction between 42. McNeil, L. E. & Grimsditch, M. Elastic moduli of muscovite mica. J. Phys. Condens. Matter
5, 1681–1690 (1993).
water and confining walls was modelled by the Lennard–Jones potential
43. Cost, J. R., Janowski, K. R. & Rossi, R. C. Elastic properties of isotropic graphite. Phil. Mag.
with parameters taken from ref. 47. Flat rigid graphene sheets were used 17, 851–854 (1968).
to mimic the confining walls. For simplicity, to account for surfaces with 44. Ling, F. F., Lai, W. M. & Lucca, D. A. Fundamentals of Surface Mechanics: With Applications
96–97 (Springer, 2012).
different θ, we varied the interaction energies of carbon with hydrogen,
45. Plimpton, S. Fast parallel algorithms for short-range molecular dynamics. J. Comput.
εHC, and oxygen, εOC. These energies47 were multiplied by a factor of k Phys. 117, 1–19 (1995).
that was varied from 0.7 to 1.3 in steps of 0.2 to find the water–wall 46. Berendsen, H. J. C., Grigera, J. R. & Straatsma, T. P. The missing term in effective pair
potentials. J. Phys. Chem. 91, 6269–6271 (1987).
interaction that would approximate the experimental contact angles.
47. Wu, Y. & Aluru, N. R. Graphitic carbon–water nonbonded interaction parameters. J. Phys.
The MD angles θ were estimated by using water droplets containing Chem. B 117, 8802–8813 (2013).
4,000 molecules. Our simulations yielded θ ≈ 85°, 63°, 30° and 11° for 48. Cicero, G., Grossman, J. C., Schwegler, E., Gygi, F. & Galli, G. Water confined in nanotubes
and between graphene sheets: a first principle study. J. Am. Chem. Soc. 130, 1871–1878
k = 0.7, 0.9, 1.1 and 1.3, respectively. The insets of Extended Data Fig. 7
(2008).
show the profiles for the water droplets found in the case of θ ≈ 11° 49. Sendner, C., Horinek, D., Bocquet, L. & Netz, R. R. Interfacial water at hydrophobic and
and 85°. We used these two θ and the corresponding k to model γSL(h) hydrophilic surfaces: slip, viscosity, and diffusion. Langmuir 25, 10768–10781 (2009).
for our mica and graphite capillaries, respectively. Note that the for-
mer value lies in the middle of the contact-angle interval observed for Acknowledgements This work was funded by Lloyd’s Register Foundation, the European
Research Council, Graphene Flagship and the Royal Society. Q.Y. acknowledges support from
mica28 and, importantly, our MD results exhibited little sensitivity to
the Leverhulme Early Career Fellowship, and F.C.W. from the Strategic Priority Research
the exact θ for strongly hydrophilic capillaries, as expected from the Program of the Chinese Academy of Sciences (XDB22040402) and the CAS Youth Innovation
cos θ dependence. Promotion Association.
Having established parameters for the desired contact angles, we
Author contributions A.K.G. suggested the project and directed it together with Q.Y. Q.Y. and
proceeded to another simulation set-up that consisted of two flat P.Z.S. fabricated devices. Q.Y. performed measurements and carried out data analysis with
four-layer graphite sheets immersed in a water box containing 40,000 help from L.F., Y.V.S., S.J.H. and Z.W.Z. F.C.W. provided theoretical support. A.K.G., Q.Y., F.C.W.
and I.V.G. wrote the manuscript. All authors contributed to discussions.
molecules. The dimension of each graphene sheet was 102.2 × 100.9 Å2
whereas the water box was 140.0 × 140.0 Å2 in size, which allowed water Competing interests The authors declare no competing interests.
molecules confined between the rigid graphite sheets to exchange eas-
ily with outside molecules. After an equilibration run of 1.0 ns, the two Additional information
sheets were brought progressively closer in steps of 0.2 Å. Each time Correspondence and requests for materials should be addressed to Q.Y., F.C.W. or A.K.G.
Peer review information Nature thanks Patrick Huber and the other, anonymous, reviewer(s)
the system was equilibrated for 0.1 ns and its total potential energy for their contribution to the peer review of this work.
was calculated for further analysis. Periodic boundary conditions were Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Nanofabrication of 2D channels. a, Simplified flow
chart for our fabrication procedures. (1) Graphene spacers and the bottom
crystal of either mica or graphite (shown in yellow) were assembled on top of an
oxidized Si wafer. (2) A suspended SiN membrane with a rectangular hole was
prepared separately. (3) The two-layer assembly was transferred from the Si
oxide wafer onto the SiN membrane. The opening was extended through the
assembly by RIE. (4) The top crystal of mica or graphite was placed on top of
graphene spacers. b, AFM micrograph of graphene spacers with N = 5. The
colour scale is given by the height profile (blue curve). c, Optical image of a final
mica device used in our experiments. The bottom mica crystal shows up in
purple on top of the square SiN membrane. Graphene spacers (N = 3) and the
top mica layer are outlined in blue and yellow, respectively. d, Cross-sectional
scanning transmission electron microscopy image of a graphite channel with
N = 2. The blue ticks mark the channel’s edges.
Article
Extended Data Fig. 2 | Measurements of capillary condensation. a, Our AFM

set-up. Humidified nitrogen gas flows through the bottom chamber made from
an aluminium alloy. A silicon wafer of 15 × 15 mm2 in size is seen to cover the
chamber, flush with its top surface. The white rubber gasket was lowered
during AFM measurements to seal the space above the Si wafer. Inset,
cross-sectional schematic showing how capillary devices were mounted during
AFM measurements. b, Schematic of a water plug inside our capillaries. For
brevity, the layered structure of water is ignored in this sketch. When the top
chamber is at low RH, the meniscus slightly retracts inside the capillary to
create a vapour pressure gradient. The RH gradient stabilizes two menisci with
the same curvature at both exit and entrance. The distance from the exit
meniscus to the opening is expected to be short because, in our atomically flat
capillaries, water moves much faster as liquid than vapour26.
Extended Data Fig. 3 | Visualization of the condensation transition using
AFM. a–c, Images of a graphite capillary with N = 3 at RH of 55%, 70% and 95%
(a, b and c, respectively). The upper part of each image shows sagging of the
top graphite crystal (H ≈ 80 nm) into the 2D channel. The lower part shows the
area immediately outside the channel, which is not covered by the top graphite.
The black dotted lines mark a border between the two regions (edge of the top
crystal). The colour scales for the lower and upper parts of the AFM images are
given by the green and black curves, respectively. The profiles are averaged
over ∼100 nm along the y direction, and the curves in the upper parts of all the
panels are provided on the same scale given by the black arrows in panel a. A
small number of horizontal scanning lines (x direction) around the black-dot
dividing lines were removed for clarity because they contained numerous
instabilities caused by the AFM tip moving along the edge of the top crystal and
jumping up and down. Such instabilities are typical for AFM scanning close to
edges.
Article
Extended Data Fig. 4 | Non-hysteretic capillary condensation with slow transition observed at 62.5 ± 2.5% for this device. The colour-coded curves
dynamics. a, Sagging profiles for a graphite capillary (N = 4) with increasing show the time evolution towards the original dry state. Note that the sagging
and decreasing RH between 75% and 80%. Black curve, initial dry-state profile. depths δ for such hysteretic loops were highly reproducible but details of
Red curve, RH was increased to 80%. Then, RH was returned to 75% and sagging profiles could differ in different RH cycles. For example, the top
maintained at this humidity. AFM profiles were taken after 4 h, 9 h and 16 h crystal’s adhesion to the right wall was different in the original and final dry
(colour coded). b, The N = 6 graphite capillary was brought from the dry state states, as seen in a (compare black and purple curves). This hysteresis is
(black curve) into the state filled with water and kept for an hour at 95% RH attributed to irreproducible vdW attachments of top crystals to channel
(red). The humidity was then decreased to ∼30%, well below the condensation sidewalls.
Extended Data Fig. 5 | Capillary condensation in 2D channels with different condensation transition occurred between 80% and 85% RH in a and between
initial sagging. a, b, Sagging profiles for two N = 5 graphite capillaries with 70% and 75% in b. The difference in RHC for the same N is attributed to different
different δ0. RH was increased in 5% steps (colour coded). The water h in the two cases.
Article
Extended Data Fig. 6 | Remnant sagging above the condensation

transition. a, Schematic of top crystal sagging. b, Typical behaviour observed
for the sagging depth δ as a function of RH, after the condensation transition
occurred at RH < 60%. Symbols: Measurements for two different mica
capillaries with N = 8. The solid curves are best fits using equations (3) and (4)
(colour-coded). The grey symbol with error bars indicates the experimental
accuracy.
Extended Data Fig. 7 | MD simulations of strongly confined water. a, Its channels. Water exhibits a pronounced layered structure near each surface,
density profiles at different distances h between two rigid capillary walls with and the structures start to overlap for h < 15 Å. Top insets, cross-sectional
the contact angle θ ≈ 11°. b, Same calculations but for contact angle 85°. The profiles for water droplets placed on the surfaces with the given θ.
orange dashed lines mark positions of the surfaces that defined the 2D
Article
Extended Data Fig. 8 | Changes in the solid–liquid surface energy caused by

atomic-scale confinement. Calculated Δγ(h) for several characteristic θ. The
arrows indicate the number of molecular layers of water that fit inside the 2D
channels.
Article
Catalytic asymmetric addition of an amine

N–H bond across internal alkenes
https://doi.org/10.1038/s41586-020-2919-z Yumeng Xi1,2, Senjie Ma1,2 & John F. Hartwig1,2 ✉
Received: 26 November 2019
Accepted: 27 October 2020 Hydroamination of alkenes, the addition of the N–H bond of an amine across an
Published online: 3 November 2020 alkene, is a fundamental, yet challenging, organic transformation that creates an
alkylamine from two abundant chemical feedstocks, alkenes and amines, with full
Check for updates
atom economy1–3. The reaction is particularly important because amines, especially
chiral amines, are prevalent substructures in a wide range of natural products and
drugs. Although extensive efforts have been dedicated to developing catalysts for
hydroamination, the vast majority of alkenes that undergo intermolecular
hydroamination have been limited to conjugated, strained, or terminal alkenes2–4;
only a few examples occur by the direct addition of the N–H bond of amines across
unactivated internal alkenes5–7, including photocatalytic hydroamination8,9, and no
asymmetric intermolecular additions to such alkenes are known. In fact, current
examples of direct, enantioselective intermolecular hydroamination of any type of
unactivated alkene lacking a directing group occur with only moderate
enantioselectivity10–13. Here we report a cationic iridium system that catalyses
intermolecular hydroamination of a range of unactivated, internal alkenes,
including those in both acyclic and cyclic alkenes, to afford chiral amines with
high enantioselectivity. The catalyst contains a phosphine ligand bearing
trimethylsilyl-substituted aryl groups and a triflimide counteranion, and the reaction
design includes 2-amino-6-methylpyridine as the amine to enhance the rates of
multiple steps within the catalytic cycle while serving as an ammonia surrogate.
These design principles point the way to the addition of N–H bonds of other
reagents, as well as O–H and C–H bonds, across unactivated internal alkenes to
streamline the synthesis of functional molecules from basic feedstocks.
Chiral amines are essential structural motifs in numerous active phar- There are considerable challenges facing the development of cata-
maceutical ingredients and in many agrochemicals and materials. They lytic hydroaminations of unactivated alkenes that bear more than one
also serve as chiral catalysts, resolving reagents and chiral auxiliaries14. substituent (Fig. 1a). Both experiments and theoretical studies have
Thus, efficient methods to prepare chiral amines have been long sought. shown that the thermodynamic driving force is weak and the kinetic
Traditional approaches15 include chemical16,17 and enzymatic18 reduc- barrier to combining two nucleophiles is high1,31. Moreover, catalysts
tive amination, hydrogenation19, nucleophilic addition to imines20 and for hydroamination often catalyse undesirable, competing alkene
nucleophilic substitution21,22. However, these methods require starting isomerization, and isomerization is typically faster than addition of
materials containing reactive functionality that is often derived from the N–H bond during many metal-catalysed hydroaminations (Fig. 1b).
feedstock alkenes. Thus, hydroamination of alkenes is the most direct Such relative rates lead to a mixture of isomeric products32,33. Many cata-
method to construct chiral amines from a functional group that is present lysts for alkene hydroamination also promote formation of oxidative
in basic feedstocks and is typically unprotected. Asymmetric additions amination products by β-hydrogen elimination of β-aminoalkylmetal
to conjugated alkenes, such as dienes23–26 and vinylarenes27,28, have been intermediates34,35. These isomerization and oxidative side reactions
reported, but the scope of the direct addition of N–H bonds to more must be suppressed to achieve hydroamination of unactivated internal
common and less reactive unconjugated alkenes is severely limited, and alkenes. Finally, because hydroamination is usually almost ergoneutral,
the enantioselectivities of asymmetric processes are far lower than those isomerization and racemization of the products during the reaction
that would enable applications for the synthesis of chiral amines. Formal can erode regioselectivity and enantioselectivity36.
hydroaminations provide an alternative approach to this problem, but To address these challenges, we modified a neutral iridium complex
the use of silanes with electrophilic aminating reagents29 and even metal containing a bisphosphine ligand, which was previously shown to cata-
reductants30, instead of amines, undermines the atom economy of the lyse the formation of hydroamination and oxidative amination products
hydroamination reaction. Direct N–H additions of unactivated internal from the reaction of terminal alkenes with amides and indoles34,37–40.
alkenes that occur with high enantioselectivity are unknown. We have previously shown that these aminations occur by a mechanism
Department of Chemistry, University of California, Berkeley, CA, USA. 2Division of Chemical Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. ✉e-mail: jhartwig@berkeley.edu
1

a NH2 Challenges
R1 H-NH2 NH2
or R2 or • Weak thermodynamic driving force • Alkene isomerization and chain walking
R2 R1 • Low reactivity of internal alkenes • Competing oxidative amination
cat.
b Me Me NHR
Me Selectivity?
Me
or cat. Me cat., RNH2 Me
NHR Me
Me
Me (Reacts much faster)
How to suppress Me Me Me NHR
isomerization? Me
Me Mixtures Mixtures
c Removable SiMe 3
Enantioselective O
regioselective NH2 SiMe 3
PyR removal
R + Ir+ 2
R N NH R′
R N NH2 R
O P
R′ R′ Ir NTf2–
Ammonia O P
R Primary amines
surrogate
O SiMe3
Design elements: cationic iridium, bidendate amine with adjacent substituent 2
SiMe3
Dissociates R′
readily R R′ Migratory
R H R′ R
LIr + H H insertion C–H RE
N R P H
P P
Ir Ir
R N NH2 N–H OA * P Ir NH NH – PyRNH2 *P NH
*P NH R N NH
2 N
N N R R′
R R R
e–-deficient Ir e–-deficient Ir Rigid iridacyle
Weakens Extra N-coordination accelerates accelerates prevents
coordination facilitates N–H OA I alkene insertion II reductive elimination III β-H elimination
Fig. 1 | Catalytic asymmetric hydroamination of unactivated internal iridium catalyst and an ammonia surrogate based on 2-aminopyridine to
alkenes. a, Long-standing challenges and previous strategies for catalytic achieve asymmetric hydroamination of internal alkenes. OA, oxidative
hydroamination of unactivated internal alkenes. b, Alkene isomerization that addition. RE, reductive elimination.
leads to a mixture of constitutional isomeric products. c, Design of a cationic
comprising oxidative addition of N–H bonds, migratory insertion of a series of heteroaromatic amines and one control arylamine that
alkenes and reductive elimination of C–H bonds34. We envisioned that possess varying structural properties. The hydroamination products
switching the iridium catalyst from neutral to cationic would lead to from these reactions consisted of three isomers (denoted A, B and
the formation of cationic iridium intermediates that would undergo C) that probably resulted from competing alkene isomerization and
migratory insertion of the alkene more rapidly41, a step that has been hydroamination. A larger amount of A reflects a faster rate for direct
shown to be rate-limiting in the reaction of terminal alkenes with the addition of the amine to the starting alkene than alkene isomerization
neutral catalyst34 (Fig. 1c). If binding and insertion of the alkene were before addition.
sufficiently enhanced, then the reaction scope might include internal As shown in Fig. 2a, hydroamination of cis-4-octene in the presence of
alkenes. In addition, we envisioned that coordinating groups adja- [Ir(coe)2Cl]2, (S)-DTBM-SEGPHOS ((S)-(+)-5,5′-bis[di(3,5-di-tert-butyl-
cent to the amine (for example, 2-aminopyridine derivatives) could 4-methoxyphenyl)phosphino]-4,4′-bi-1,3-benzodioxole) and NaBARF
facilitate and make more thermodynamically favourable the initial (sodium tetrakis[3,5-bis(trifluoromethyl)phenyl]borate) occurred
N–H oxidative addition, and such enhancements of this step could be only when 6-substituted 2-aminopyridine derivatives were used as
important because the rate of oxidative addition of electron-deficient the amine source. The hydroamination of 2-amino-6-methylpyridine
metal complexes is generally slower than that of electron-rich ones42. formed amines A, B and C in a combined 73% yield, with A constitut-
If properly placed, the coordinating groups of the amine could also ing 28% of the amine products (defined as A/(A + B + C)). The reaction
form a rigid six-membered iridacycle upon insertion of the alkene; this of 2-amino-6-trifluoromethylpyridine afforded only 5% of isomer A.
geometry could suppress β-hydrogen elimination to form enamines. Other amines tested, including the parent 2-aminopyridine, did not
Such a combination of 2-aminopyridine and a cationic iridium cata- undergo this hydroamination. These results suggest that the substitu-
lyst was evaluated12 for the hydroamination of terminal vinylarenes, ent in the 6-position of the pyridine ring is essential to promote the
but only briefly for the hydroamination of unactivated α-alkenes. Low hydroamination.
enantioselectivities (≤11% enantiomeric excess, e.e.) were observed, To suppress competing alkene isomerization and improve the
and the reactions of unstrained internal alkenes were not examined. reaction yield, we examined catalysts containing a series of bis-
We reasoned that a substituent near the binding group of the amine phosphine ligands and counteranions for hydroamination with
could modulate the strength of the coordination of the pyridine, lead- 2-amino-6-methylpyridine as the amine. Studies of the electronic
ing to potential enhancement of the activity of the iridium catalyst, and steric properties of ligands indicated that the ligands that are
due to weakened coordination (Fig. 1c). Finally, the pyridyl group of less electron-rich (Fig. 2b, entry 2) and more sterically encumbered
the product could be cleaved by known methods to reveal the cor- (Fig. 2b, entries 3–5) than (S)-DTBM-SEGPHOS form catalysts that
responding primary amine43. are more reactive and selective for direct addition to form amine
Figure 2 summarizes our studies on the development of a cati- 1. The observed higher reactivity of catalysts generated from more
onic iridium catalyst and an ammonia surrogate for the asymmetric electron-poor ligands probably results from a greater rate of migratory
hydroamination of unactivated internal alkenes. These experiments insertion of the alkene (see below) into the Ir–N bond of complexes
revealed the effect of the reaction components of the system on the containing L2–L5 lacking the methoxy group than into the Ir–N bond
reaction yield, level of alkene isomerization and enantioselectivity. of the complex ligated by (S)-DTBM-SEGPHOS possessing the p-OMe
To identify a suitable ammonia surrogate for this reaction, we surveyed group44. The higher selectivity from the more hindered ligands could

Article
a 2.5 mol% [Ir(coe)2Cl]2
NHAr NHAr NHAr
6 mol%
Me (S)-DTBM-SEGPHOS Me + Me + Me
ArNH2 + Me Me Me
Me 6 mol% NaBARF, 120 °C
(10 equiv.) 4-amine (A) 3-amine (B) 2-amine (C)
Me
N NH2 Me N NH2 F 3C N NH2 N NH2 N

Me NH2
73%a (28%) c 51%a (5%) c NH2
0%a,b 0%a,b 0%a,b 0%a,b
b NHPyMe Ligands used in this study

Me tBu OMe R
Me NHPyMe
4-amine (1) Me O tBu O R
Me 5 mol% L*Ir +X – Me 2 2
+
3-amine (1′) O P O
100 °C or P
Me N NH2 NHPyMe
(10 equiv.) 120 °C O P P
O
Me Me
Me
O tBu O R
PyMe 2-amine (1″) 2 2
tBu OMe R
Yield rac-DTB-SEGPHOS
Entry L* X 4-Selectivity d (1+1′+1″) (S)-DTBM-SEGPHOS (R = tBu) (L2)
(L1) rac-TMS-SEGPHOS
1e (S)-DTBM-SEGPHOS (L1) BARF 48% 16% (R = SiMe 3 ) (L3)
2e rac-DTB-SEGPHOS (L2) BARF 50% 42% SiMe3 SiMe3
100 °C
3e rac-TMS-SEGPHOS (L3) BARF 83% 34%

O SiMe3 SiMe3
4e (R)-TMS-SYNPHOS (L4) BARF 83% 26% 2 2
5e (R)-TMS-MeOBIPHEP (L5) BARF 83% 22% O P MeO P

6e (S)-DTBM-SEGPHOS (L1) BARF 28% O P MeO P
73%
7f (S)-DTBM-SEGPHOS (L1) OTf 83% 9% SiMe3 SiMe3
O 2 2
8f (S)-DTBM-SEGPHOS (L1) BF4 67% 9%
120 °C
SiMe3 SiMe3
9f (S)-DTBM-SEGPHOS (L1) NTf2 63% 60% (–95% e.e.)
10 (S)-DTBM-SEGPHOS (L1) Cl n/a 0% (R)-TMS-SYNPHOS (L4) (R)-TMS-MeOBIPHEP
(new ligand) (L5)
11f,g (R)-TMS-SYNPHOS (L4) NTf2 89% 78% (97% e.e.)
Fig. 2 | Development of asymmetric hydroamination of unactivated isomerization. aCombined yield. bNo reaction. cDefined as A/(A + B + C).
d
internal alkenes with 2-amino-6-methylpyridine as an ammonia 4-Selectivity defined as 1/(1 + 1′ + 1″). eConditions: 2.5 mol% [Ir(coe)2Cl]2,
surrogate. a, Identification of suitable ammonia surrogates to enable 6 mol% ligand, 6 mol% NaBARF, toluene. fConditions: 5 mol% [L*Ir(COD)]X,
hydroamination of unactivated internal alkenes. b, Identification of reaction toluene. gIn 2-MeTHF.
conditions to achieve asymmetric hydroamination and to suppress alkene
result from a greater sensitivity of the rate of insertion of the alkene unsymmetrical internal alkenes bearing polar functional groups at the
into the metal–amido bond (Fig. 1c, structure II) to steric effects than homoallylic position occurred with synthetically useful regioselectiv-
the rate of insertion into the metal–hydride bond, which probably ity (2:1 to 10:1). These functional groups include phthalimidyl groups
leads to alkene isomerization45,46. Studies with various counterions (7, 8), sulfonamido groups (9), silyloxy groups (10), bis(ethoxycarbonyl)
revealed that reactions with triflimide as the counterion of the cata- methyl groups (11) and (hetero)aryloxy groups (12–14). These groups
lyst occurred with the highest selectivity at high conversion (Fig. 2b, presumably influence the regioselectivity by inductive effects that
entries 6–9). The origin of the effects of the counterions is difficult to are similar to those observed for other classes of functionalization of
ascertain, but an effect is clear. Control experiments showed that the unsymmetrical internal alkenes47–49.
reaction catalysed by a neutral iridium complex under otherwise iden- The reactivity of the system for Z-alkenes enabled the hydroamina-
tical conditions (Fig. 2b, entry 10) resulted in the exclusive formation tion of cyclic alkenes, and these reactions also occurred with high enan-
of oxidative amination products. By combining the chiral ligand that tioselectivity. The combination of [Ir(coe)2Cl]2, (S)-DTBM-SEGPHOS
led to the highest selectivity with the triflimide anion in the form of and NaBARF was used as the catalyst because it was more reactive than
[((R)-TMS-SYNPHOS)Ir(COD)]NTf2, the model hydroamination reaction [((R)-TMS-SYNPHOS)Ir(COD)]NTf2 for the hydroamination of cyclic
formed the 4-aminooctane (1) in high yield; this reaction also occurred alkenes. The cyclic hydrocarbons cyclopentene, cyclohexene, cyclo-
with remarkably high enantioselectivity (Fig. 2b, entry 11). Thus, the heptene and cyclooctene all underwent hydroamination in high yields
substituent on the amine, the new phosphine ligand and the use of a (15–18). The hydroamination of a series of substituted cyclopentenes
triflimide counterion all led to the high activity, chemoselectivity and formed chiral amine products with high enantioselectivity (19–23,
enantioselectivity of the reaction. 90–92% e.e.). The reaction to form the 1,3-substituted cyclopentane 23
With this catalyst and reagent, we examined the scope of alkenes that from the 4-methoxycarbonyl-substituted cyclopentene occurred with
underwent hydroamination (Fig. 3). Both symmetrical internal alkenes high diastereoselectivity for the trans product. In addition, cyclohex-
and unsymmetrical internal alkenes underwent hydroamination with ene derivatives with 3,3-substituents underwent hydroamination to
2-amino-6-methylpyridine. The reactions all proceeded in greater afford two products with regioselectivity of approximately 1:3 (24/24′,
than 90% e.e. Hydroamination of symmetrical alkenes afforded prod- 25/25′). Although the major isomer is achiral, the chiral minor isomer
ucts containing linear alkyl groups (1), aryl-substituted alkyl groups was formed with good to high enantioselectivity. We observed in some
(2), branched alkyl groups (3), silyloxy groups (4), alkoxy groups (5) cases that the hydroaminations of substituted cycloalkenes with rings
and alkoxycarbonyl groups (6) in good to high yields. Reactions of larger than that of cyclopentene occurred to high conversion, but were

5 mol% = Py Me
R [(R)-TMS-SYNPHOSIr(COD)]NTf2 HN N Me
+
Me N NH2 R′ R
120 °C, 2-MeTHF R′
a NHPyMe NHPyMe Me NHPyMe NHPyMe

Me Ph Me TBSO
Me Ph Me OTBS
1a 2a 3 Me 4
52%, 97% e.e. 82%, 92% e.e. 76%, 98% e.e. 77%, 97% e.e.
O
NHPyMe Me Me NHPyMe O NHPyMe
MeO MeO N
OMe OMe OTBS
O Me Me O 7
5b 6 63%, 95% e.e.
55%, 97% e.e. 85%, 98% e.e. regioselectivity 2.9:1
O NHPyMe NHPyMe Me NHPyMe

NHPyMe Tf
N Me TBSO Me EtO 2C Me
N Me
EtO 2C
O 8 9 10 11
64%, 94% e.e. 62%, >99% e.e. 37%, 97% e.e. 62%, 90% e.e.
regioselectivity 2.9:1 regioselectivity 10:1 regioselectivity 2.5:1 regioselectivity 2.2:1
NHPyMe
NHPyMe NHPyMe
O Me
F 3C O Me F 3C N O Me
H H
12 13c 14
56%, >99% e.e. 60% (13+13′), 99% e.e. H 60%, >20:1 d.r.
regioselectivity 8:1 regioselectivity 2.8:1
regioselectivity 4:1
CF 3 O
O
bd
NHPyMe NHPyMe NHPyMe NHPyMe NHPyMe O NHPyMe
MeO2C Me
MeO2C Me
O
15 16 17 18 19 20
84% 81% 90% 82% 60%, 92% e.e. 58%, 90% e.e.
trans
NHPyMe R
TBSO AcO NHPyMe NHPyMe NHPyMe NHPyMe
R
MeO2C + R
TBSO AcO
R
R = CO 2Et 24, 24%, 92% e.e. 24′, 48%
21 22 23
64%, 90% e.e. 53%, 90% e.e. 55%, 91% e.e. R = Me 25, 69% e.e. 25′
10:1 d.r. 52% (25+25′), regioselectivity 1:3.5
ce,f O
Me
NH2 NH2 Me Me HN Me
Tf
Me N Me MeO
Me
26 27 O 28
85% GC yield, >93% e.e. 71% yield, >96% e.e. 79% yield
(35% yield isolated as Boc amide)
d
Fig. 3 | Scope of internal alkenes that undergo hydroamination. a, Scope of Conditions: 2.5 mol% [Ir(coe)2Cl]2, 7.5 mol% (S)-DTBM-SEGPHOS, 6 mol%
asymmetric hydroamination of acyclic internal alkenes. b, Scope of NaBARF, 1,4-dioxane, 120 °C. eConditions: PtO2, HCl, H2 (1 atm); NaBH4,
hydroamination of simple cycloalkenes and of asymmetric hydroamination of THF/EtOH. fEnantioselectivities were determined after conversion to the
substituted cyclic alkenes. c, Products from the removal of the 2-(6-methyl) original hydroamination product by palladium-catalysed cross-coupling of the
pyridyl group. a2.5 mol% catalyst. b7.5 mol% catalyst. c20 mol% catalyst. primary amine with 2-bromo-6-methylpyridine.
complicated by competing alkene isomerization, which led to mixtures To understand how the combination of a cationic iridium catalyst
of isomers that were difficult to separate. and 2-amino-6-methylpyridine enabled the hydroamination of unac-
The pyridyl group of the hydroamination products (1, 9) was cleaved tivated internal alkenes, we investigated the reaction mechanism.
by a short sequence that consisted of protonation, hydrogenation The reaction of a substituted cyclopentene with N,N-dideuterio-
and borohydride reduction. The corresponding primary amines 2-amino-6-methylpyridine showed that the addition occurred in
(26, 27) formed in 71–85% yield with little or no erosion in enantio- a syn fashion. This stereochemistry is consistent with a mecha-
meric excess (see Supplementary Information sections VI and X). By nism that involves migratory insertion of an alkene, rather than
the same sequence, hydroamination product 6 was converted to the nucleophilic attack on a metal-bound alkene complex (Fig. 4a).
corresponding δ-lactam (28) in 79% yield. Kinetic experiments showed that the reaction is first-order in

Article
CO2Me 2.5 mol% [Ir(coe)2Cl]2 R′
a NHPyMe D
6 mol% (S)-DTBM-SEGPHOS
P R
6 mol% NaBARF MeO2C
+ *P Ir
Me N ND2 Probably via ND
Dioxane, 120 °C, 36 h D
44% isolated yield 40% D N
90% D 5 equiv. Me
23-D
NHPyMe
b Me [(R)-TMS-SYNPHOSIr(COD)]NTf2
+ Me
Me 2-MeTHF, 120 °C Me
Me N NH2 1
4.5 1.7 25
y = 5.7x + 1.4 1.5 y = 2.5x + 0.14 y = 2.6 × 103x – 4.5
1/(Initial rate) (×10–6 s M–1)
1/(Initial rate) (×10–6 s M–1)

4.0 R2 = 0.99 20
Initial rate × 107 (M s–1)

R2 = 0.99 R2 = 0.99
1.3
3.5 1.1 15
3.0 0.9 10
0.7
2.5 5
0.5
2.0 0.3 0
0.1 0.2 0.3 0.4 0.5 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0 0.002 0.004 0.006 0.008 0.010 0.012
1/[Alkene] (M–1) [Amine] (M) [lr] (M)
c 5 mol%
Me [(R)-TMS-SYNPHOSIr(COD)]NTf2
+ +
Me No reaction
Me N NH2 N NH2 2-MeTHF, 120 °C, 48 h
1 equiv. 1 equiv. 10 equiv.
H
H
Ir
Ir
C C
Npy C
Nam
Nam C
Npy
H
P NH
H Ir
P TS-1a TS-2a *
P
Ir
* Me Ir-Npy (Å) 2.14 2.42 Me N
Side view P N
HN Ir-Nam (Å) 2.32 2.14
TS-1a TS-2a
ΔΔG‡ = 0 kcal mol–1 ΔΔ G‡ = 4.1 kcal mol–1
Fig. 4 | Mechanistic study of the hydroamination. a, Deuterium-labelling (S)-TMS-SEGPHOS. The hydride is located trans to the amido ligand in the
experiments. b, Experiments to reveal kinetic orders of each reaction lowest-energy transition state (TS-1a) that leads to the (R)-enantiomer but is
component. c, Competition experiments using 2-amino-6-methylpyridine and located trans to the pyridyl group in the lowest-energy transition state (TS-2a)
2-aminopyridine. d, Transition-state structures of alkene migratory insertion that leads to the (S)-enantiomer. Error bars in b correspond to those in the
computed by DFT. Single-point energies were computed at the M06/6- initial rates that result from errors in the integration of peaks in the gas
311+G(d,p)/SDD/SMD(1,4-dioxane) level of theory with structures optimized at chromatogram.
the B3LYP/6-31G(d)/SDD level. The ligand used for the calculations is
[iridium catalyst], positive-order in [cis-4-octene] and inverse-order reaction, we conducted hydroamination of cis-4-octene with equi-
in [2-amino-6-methylpyridine] (Fig. 4b). These data suggest that a molar amounts of 2-amino-6-methylpyridine and 2-aminopyridine.
molecule of 2-amino-6-methylpyridine dissociates reversibly from Whereas the reaction of 2-amino-6-methylpyridine alone afforded
iridium in the catalyst resting state before rate-limiting insertion of the hydroamination product in high yield, the reaction that contained
the alkene, presumably into the metal–amido bond41. To elucidate both 2-amino-6-methylpyridine and 2-aminopyridine provided neither
why the methyl group of 2-amino-6-methylpyridine is essential in this hydroamination product (Fig. 4c). This result implies that stronger

binding of the 2-aminopyridine than of 2-amino-6-methylpyridine 2. Müller, T. E., Hultzsch, K. C., Yus, M., Foubelo, F. & Tada, M. Hydroamination: direct
addition of amines to alkenes and alkynes. Chem. Rev. 108, 3795–3892 (2008).
inhibits the catalyst. It also implies that the methyl group of 3. Huang, L., Arndt, M., Gooßen, K., Heydt, H. & Gooßen, L. J. Late transition metal-catalyzed
2-amino-6-methylpyridine weakens the binding to the iridium centre, hydroamination and hydroamidation. Chem. Rev. 115, 2596–2697 (2015).
thereby allowing alkene binding and insertion. 4. Reznichenko, A. L. & Hultzsch, K. C. in Organic Reactions 1–554 (Wiley, 2015).
5. Gurak, J. A., Yang, K. S., Liu, Z. & Engle, K. M. Directed, regiocontrolled hydroamination of
To investigate the origin of the high levels of enantioselectivity in the unactivated alkenes via protodepalladation. J. Am. Chem. Soc. 138, 5805–5808 (2016).
hydroamination, we conducted calculations on the alkene migratory 6. Karshtedt, D., Bell, A. T. & Tilley, T. D. Platinum-based catalysts for the hydroamination of
insertion using density functional theory (DFT). On the basis of the olefins with sulfonamides and weakly basic anilines. J. Am. Chem. Soc. 127, 12640–12646
(2005).
kinetic experiments, this step is probably the rate-limiting and enantio- 7. Zhang, J., Yang, C.-G. & He, C. Gold(i)-catalyzed intra- and intermolecular hydroamination
determining step of the catalytic cycle. We calculated 12 possible iso- of unactivated olefins. J. Am. Chem. Soc. 128, 1798–1799 (2006).
meric transition states of migratory insertion that would lead to either 8. Musacchio, A. J. et al. Catalytic intermolecular hydroaminations of unactivated olefins
with secondary alkyl amines. Science 355, 727–730 (2017).
enantiomer of the products with cis-2-butene as the model alkene. 9. Nguyen, T. M., Manohar, N. & Nicewicz, D. A. anti-Markovnikov hydroamination of alkenes
The structures of the lowest-energy transition states that lead to catalyzed by a two-component organic photoredox system: direct access to
the major and minor enantiomers are illustrated in Fig. 4d. The phenethylamine derivatives. Angew. Chem. Int. Ed. 53, 6198–6201 (2014).
10. Zhang, Z., Lee, S. D. & Widenhoefer, R. A. Intermolecular hydroamination of ethylene and
two transition states of many metal-catalysed enantioselective 1-alkenes with cyclic ureas catalyzed by achiral and chiral gold(i) complexes. J. Am.
hydrofunctionalization reactions of alkenes to form two enantiomers Chem. Soc. 131, 5372–5373 (2009).
differ principally by the face of the alkene to which the metal is bound. 11. Reznichenko, A. L., Nguyen, H. N. & Hultzsch, K. C. Asymmetric intermolecular
hydroamination of unactivated alkenes with simple amines. Angew. Chem. Int. Ed. 49,
In the current system, the orientation of ancillary ligands around the 8984–8987 (2010).
metal, in addition to the face of the alkene, differs greatly in the two 12. Pan, S., Endo, K. & Shibata, T. Ir(i)-catalyzed intermolecular regio- and enantioselective
transition states. The geometry of TS-1a, the transition state leading hydroamination of alkenes with heteroaromatic amines. Org. Lett. 14, 780–783 (2012).
13. Vanable, E. P. et al. Rhodium-catalyzed asymmetric hydroamination of allyl amines.
to the major enantiomer, contains meridionally oriented hydride, J. Am. Chem. Soc. 141, 739–742 (2019).
pyridine and amido ligands with the hydride trans to the amido group. 14. Nugent, T. C. Chiral Amine Synthesis: Methods, Developments and Applications (Wiley
By contrast, these three ligands in TS-2a, which is the lowest-energy VCH, 2010).
15. Trowbridge, A., Walton, S. M. & Gaunt, M. J. New strategies for the transition-metal
transition state leading to the minor enantiomer, are arranged with catalyzed synthesis of aliphatic amines. Chem. Rev. 120, 2613–2692 (2020).
the hydride trans to the pyridine donor. A geometry analogous to TS- 16. Wang, C. & Xiao, J. in Stereoselective Formation of Amines (eds Li, W. & Zhang, X.) 261–282
(Springer, 2014).
1a that would form the opposite enantiomer by orienting the methyl
17. Nugent, T. C. & El-Shazly, M. Chiral amine synthesis – recent developments and trends for
groups away from the pyridine ligand (TS-2c and TS-2d; Supplemen- enamide reduction, reductive amination, and imine reduction. Adv. Synth. Catal. 352,
tary Information Scheme S4) is higher in energy than is TS-2a. TS-1a 753–819 (2010).
18. Patil, M. D., Grogan, G., Bommarius, A. & Yun, H. Oxidoreductase-catalyzed synthesis of
is probably the lowest-energy transition state for several reasons.
chiral amines. ACS Catal. 8, 10985–11015 (2018).
First, the Ir–Nam bond to the amido group, which is trans to a hydride in 19. Xie, J.-H., Zhu, S.-F. & Zhou, Q.-L. Transition metal-catalyzed enantioselective
TS-1a, is elongated (2.32 Å), thereby leading to higher reactivity hydrogenation of enamines and imines. Chem. Rev. 111, 1713–1760 (2011).
20. Ellman, J. A., Owens, T. D. & Tang, T. P. N-tert-Butanesulfinyl imines: versatile intermediates
towards insertion. Second, the alkene is perpendicular to the P–Ir–P
for the asymmetric synthesis of amines. Acc. Chem. Res. 35, 984–995 (2002).
plane in TS-1a, whereas the alkene is almost co-planar with the P–Ir–P 21. You, S.-L., Zhu, X.-Z., Luo, Y.-M., Hou, X.-L. & Dai, L.-X. Highly regio- and enantioselective
plane in TS-2a. These orientations place the substituents on the alkene Pd-catalyzed allylic alkylation and amination of monosubstituted allylic acetates with
novel ferrocene P,N-ligands. J. Am. Chem. Soc. 123, 7471–7472 (2001).
in TS-1a farther from the phosphine ligand than those on the alkene
22. Ohmura, T. & Hartwig, J. F. Regio- and enantioselective allylic amination of achiral allylic
in TS-2a, leading to less steric hindrance in TS-1a than in TS-2a. This esters catalyzed by an iridium−phosphoramidite complex. J. Am. Chem. Soc. 124,
analysis suggests that electronic and steric effects together impart 15164–15165 (2002).
23. Löber, O., Kawatsura, M. & Hartwig, J. F. Palladium-catalyzed hydroamination of
high enantioselectivity to the hydroamination.
1,3-dienes: a colorimetric assay and enantioselective additions. J. Am. Chem. Soc. 123,
Our work demonstrates that the direct N–H addition of amines 4366–4367 (2001).
to unactivated internal alkenes can occur with high enantioselec- 24. Adamson, N. J., Hull, E. & Malcolmson, S. J. Enantioselective intermolecular addition of
aliphatic amines to acyclic dienes with a Pd–PHOX catalyst. J. Am. Chem. Soc. 139,
tivity under thermal conditions, without the need for strategies 7180–7183 (2017).
involving formal hydroamination. Despite the typically high barri- 25. Long, J., Wang, P., Wang, W., Li, Y. & Yin, G. Nickel/Brønsted acid-catalyzed chemo- and
ers and weak thermodynamic driving force for hydroamination, enantioselective intermolecular hydroamination of conjugated dienes. iScience 22,
369–379 (2019).
the use of cationic bisphosphine-ligated iridium as the catalyst and 26. Tran, G., Shao, W. & Mazet, C. Ni-catalyzed enantioselective intermolecular
2-amino-6-methylpyridine as the amine led to enhancements of the hydroamination of branched 1,3-dienes using primary aliphatic amines. J. Am. Chem. Soc.
rates of multiple steps within the catalytic cycle and to suppression of 141, 14814–14822 (2019).
27. Kawatsura, M. & Hartwig, J. F. Palladium-catalyzed intermolecular hydroamination of
alkene isomerization and oxidative amination, enabling the hydroami- vinylarenes using arylamines. J. Am. Chem. Soc. 122, 9546–9547 (2000).
nation of unactivated internal alkenes in high yield and with high enan- 28. Utsunomiya, M. & Hartwig, J. F. Intermolecular, Markovnikov hydroamination of
tioselectivity. The hydroamination products can be converted to the vinylarenes with alkylamines. J. Am. Chem. Soc. 125, 14286–14287 (2003).
29. Yang, Y., Shi, S.-L., Niu, D., Liu, P. & Buchwald, S. L. Catalytic asymmetric hydroamination
corresponding primary amines readily with preservation of the high of unactivated internal olefins to aliphatic amines. Science 349, 62–66 (2015).
enantiomeric excess of the hydroamination products. These design 30. Gui, J. et al. Practical olefin hydroamination with nitroarenes. Science 348, 886–891 (2015).
principles should provide a starting point to address the long-standing 31. Johns, A. M., Sakai, N., Ridder, A. & Hartwig, J. F. Direct measurement of the
thermodynamics of vinylarene hydroamination. J. Am. Chem. Soc. 128, 9306–9307
challenge of applying hydroamination of unactivated internal alkenes (2006).
to the synthesis of chiral amines and inspire advances in other asym- 32. Liu, Z. & Hartwig, J. F. Mild, rhodium-catalyzed intramolecular hydroamination of
metric hydrofunctionalizations of internal alkenes. unactivated terminal and internal alkenes with primary and secondary amines. J. Am.
Chem. Soc. 130, 1570–1571 (2008).
33. Huang, J.-M., Wong, C.-M., Xu, F.-X. & Loh, T.-P. InBr3 catalyzed intermolecular
hydroamination of unactivated alkenes. Tetrahedron Lett. 48, 3375–3377 (2007).
Online content 34. Sevov, C. S., Zhou, J. & Hartwig, J. F. Iridium-catalyzed intermolecular hydroamination of
unactivated aliphatic alkenes with amides and sulfonamides. J. Am. Chem. Soc. 134,
Any methods, additional references, Nature Research reporting sum- 11960–11963 (2012).
maries, source data, extended data, supplementary information, 35. Utsunomiya, M., Kuwano, R., Kawatsura, M. & Hartwig, J. F. Rhodium-catalyzed
acknowledgements, peer review information; details of author con- anti-Markovnikov hydroamination of vinylarenes. J. Am. Chem. Soc. 125, 5608–5609
(2003).
tributions and competing interests; and statements of data and code 36. Pawlas, J., Nakao, Y., Kawatsura, M. & Hartwig, J. F. A general nickel-catalyzed
availability are available at https://doi.org/10.1038/s41586-020-2919-z. hydroamination of 1,3-dienes by alkylamines: catalyst selection, scope, and mechanism.
J. Am. Chem. Soc. 124, 3669–3679 (2002).
37. Casalnuovo, A. L., Calabrese, J. C. & Milstein, D. Rational design in homogeneous
1. Müller, T. E. & Beller, M. Metal-initiated amination of alkenes and alkynes. Chem. Rev. 98, catalysis. Iridium(i)-catalyzed addition of aniline to norbornylene via nitrogen-hydrogen
675–704 (1998). activation. J. Am. Chem. Soc. 110, 6738–6744 (1988).

Article
38. Dorta, R., Egli, P., Zürcher, F. & Togni, A. The [IrCl(diphosphine)]2/fluoride system. 45. Zhang, M., Hu, L., Lang, Y., Cao, Y. & Huang, G. Mechanism and origins of regio- and
Developing catalytic asymmetric olefin hydroamination. J. Am. Chem. Soc. 119, enantioselectivities of iridium-catalyzed hydroarylation of alkenyl ethers. J. Org. Chem.
10857–10858 (1997). 83, 2937–2947 (2018).
39. Zhou, J. & Hartwig, J. F. Intermolecular, catalytic asymmetric hydroamination of bicyclic 46. Xing, D., Qi, X., Marchant, D., Liu, P. & Dong, G. Branched-selective direct α-alkylation of
alkenes and dienes in high yield and enantioselectivity. J. Am. Chem. Soc. 130, cyclic ketones with simple alkenes. Angew. Chem. Int. Ed. 58, 4366–4370 (2019).
12220–12221 (2008). 47. Xi, Y., Butcher, T. W., Zhang, J. & Hartwig, J. F. Regioselective, asymmetric formal
40. Sevov, C. S., Zhou, J. & Hartwig, J. F. Iridium-catalyzed, intermolecular hydroamination of hydroamination of unactivated internal alkenes. Angew. Chem. Int. Ed. 55, 776–780 (2016).
unactivated alkenes with indoles. J. Am. Chem. Soc. 136, 3200–3207 (2014). 48. Mei, T.-S., Werner, E. W., Burckle, A. J. & Sigman, M. S. Enantioselective redox-relay
41. Hanley, P. S. & Hartwig, J. F. Migratory insertion of alkenes into metal–oxygen and metal– oxidative heck arylations of acyclic alkenyl alcohols using boronic acids. J. Am. Chem.
nitrogen bonds. Angew. Chem. Int. Ed. 52, 8510–8525 (2013). Soc. 135, 6830–6833 (2013).
42. Thompson, W. H. & Sears, C. T. Kinetics of oxidative addition to iridium(i) complexes. 49. Morandi, B., Wickens, Z. K. & Grubbs, R. H. Regioselective Wacker oxidation of internal
Inorg. Chem. 16, 769–774 (1977). alkenes: rapid access to functionalized ketones facilitated by cross-metathesis. Angew.
43. Smout, V. et al. Removal of the pyridine directing group from α-substituted N-(pyridin- Chem. Int. Ed. 52, 9751–9754 (2013).
2-yl)piperidines obtained via directed Ru-catalyzed sp3 C–H functionalization. J. Org.
Chem. 78, 9803–9814 (2013). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
44. Hanley, P. S. & Hartwig, J. F. Intermolecular migratory insertion of unactivated olefins into published maps and institutional affiliations.
palladium–nitrogen bonds. Steric and electronic effects on the rate of migratory
insertion. J. Am. Chem. Soc. 133, 15661–15673 (2011). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Author contributions Y.X. and J.F.H. conceived the project. Y.X. discovered the reaction and
Data availability performed experiments and DFT calculations. S.M. performed experiments for revision. Y.X.
and J.F.H. wrote the manuscript.
The data that support the findings of this study are available within the
article and its Supplementary Information. Competing interests The authors declare no competing interests.
Additional information
Acknowledgements The enantioselective aspects of the work were supported by the National Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
Institutes of Health under grant R35GM130387 and the catalyst development was supported 2919-z.
by the Director, Office of Science, of the US Department of Energy under contract number Correspondence and requests for materials should be addressed to J.F.H.
DE-AC02-05CH11231. Calculations were performed at the Molecular Graphics and Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the
Computation Facility at UC Berkeley funded by the NIH (S10OD023532). We gratefully peer review of this work.
acknowledge Takasago for gifts of (S)-DTBM-SEGPHOS, and H. Celik for assistance with Reprints and permissions information is available at http://www.nature.com/reprints.
nuclear magnetic resonance (NMR) experiments. Instruments in the College of Chemistry
NMR facility are supported in part by NIH S10OD024998. We thank R. G. Bergman, B. Su and
T. Butcher for discussions. Y.X. thanks Bristol-Myers Squibb for a graduate fellowship,
S. Pedram for supply of NaBARF and D. Small for assistance with DFT calculations.
Article
Quantification of an efficiency–sovereignty
trade-off in climate policy
https://doi.org/10.1038/s41586-020-2982-5 Nico Bauer1 ✉, Christoph Bertram1, Anselm Schultes1, David Klein1, Gunnar Luderer1,2,
Elmar Kriegler1, Alexander Popp1 & Ottmar Edenhofer1,2,3
Received: 13 March 2020

The Paris Agreement calls for a cooperative response with the aim of limiting global
warming to well below two degrees Celsius above pre-industrial levels while
Check for updates
reaffirming the principles of equity and common, but differentiated responsibilities
and capabilities1. Although the goal is clear, the approach required to achieve it is not.
Cap-and-trade policies using uniform carbon prices could produce cost-effective
reductions of global carbon emissions, but tend to impose relatively high mitigation
costs on developing and emerging economies. Huge international financial transfers
are required to complement cap-and-trade to achieve equal sharing of effort, defined
as an equal distribution of mitigation costs as a share of income2,3, and therefore the
cap-and-trade policy is often perceived as infringing on national sovereignty2–7. Here
we show that a strategy of international financial transfers guided by moderate
deviations from uniform carbon pricing could achieve the goal without straining
either the economies or sovereignty of nations. We use the integrated assessment
model REMIND–MAgPIE to analyse alternative policies: financial transfers in uniform
carbon pricing systems, differentiated carbon pricing in the absence of financial
transfers, or a hybrid combining financial transfers and differentiated carbon prices.
Under uniform carbon prices, a present value of international financial transfers of
4.4 trillion US dollars over the next 80 years to 2100 would be required to equalize
effort. By contrast, achieving equal effort without financial transfers requires carbon
prices in advanced countries to exceed those in developing countries by a factor of
more than 100, leading to efficiency losses of 2.6 trillion US dollars. Hybrid solutions
reveal a strongly nonlinear trade-off between cost efficiency and sovereignty:
moderate deviations from uniform carbon prices strongly reduce financial transfers
at relatively small efficiency losses and moderate financial transfers substantially
reduce inefficiencies by narrowing the carbon price spread. We also identify risks and
adverse consequences of carbon price differentiation due to market distortions that
can undermine environmental sustainability targets8,9. Quantifying the advantages
and risks of carbon price differentiation provides insight into climate and
sector-specific policy mixes.
The Paris Agreement (PA) set out ambitious targets for climate change In view of the deep differences in economic, structural and technologi-
stabilization that require a coordinated effort to limit and reduce green- cal development across countries1,12,13 (Fig. 1a–d), international climate
house gas (GHG) emissions. Negotiations of an ambitious international policies have to balance these criteria to achieve the PA targets.
emissions reductions framework building on the PA need to consider
at least three criteria. First, fair effort sharing requires that unfair
outcomes are avoided, such as regressive policy effects that imply Sovereignty versus equity trade-off
higher relative effort by low-income countries. Second, cost efficiency— So far, most analyses of the PA climate targets have applied cost-efficient
minimizing the global aggregate mitigation costs—requires equaliza- policy frameworks, implying uniform carbon prices across regions, at
tion of marginal abatement costs across countries for an incremental least in the medium term. Although the uniform price sets the upper
ton of carbon dioxide (CO2) avoided. Finally, in this study, national sov- bound for the marginal abatement costs of mitigation measures that
ereignty focuses on nation states’ aim to maintain governing control of are undertaken as equal for each region, the resulting relative emis-
economic resources by limiting international transfer payments6,7,10,11. sions reductions and mitigation costs vary across regions (Fig. 1e, f).
Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany. 2Technical University of Berlin, Berlin, Germany. 3Mercator Institute on Global
1
Commons and Climate Change (MCC), Berlin, Germany. ✉e-mail: nico.bauer@pik-potsdam.de

Article
a e 175
40 Regional emissions reduction
on
emissions reduction (global = 100)

Ratio of regional to global relative
stronger than global
Income per capita

(US$1,000 (2010))
30 150
20
125
10
100
0
15
b 75
Regional emissions reduction
on
(billion tCO2)
10
Emissions
50 weaker than global
CD
ia
Am atin
d A st
on ng
ca
i es
As
fric
an e Ea
5
ec ormi
eri
OE
om
L
l
dd
f
Re
Mi
0 f
c 12.5
Difference of regional to global mitigation costs

20
(percentage-point differences, global = 0)

10.0
Regional mitigation cost
Emissions per
capita (tCO2)
7.5 higher than global
5.0
10
2.5
0
d 4,000
0
(US$ (2010) per tCO2)
Carbon productivity
3,000 Regional mitigation cost

lower than global
–10
2,000
CD
ia
Am atin
d A st
on ng
ca
ies
As
fric
an e Ea
ec ormi
1,000
eri
OE
om
L
l
dd
f
Re
Mi
1960 1980 2000 2020
Year
Region Models
AIM/CGE IMAGE TIAM
Asia OECD
BET MESSAGE–GLOBIOM WITCH–GLOBIOM
Latin America Reforming economies GCAM POLES
Middle East and Africa GRAPE REMIND–MAgPIE
Fig. 1 | World economic development and emission mitigation projections. (e) and regional mitigation costs relative to GDP (f) in 2040 used by the IPCC for
a–d, Global socioeconomic developments across five world regions: income mitigation scenarios. The boxplots show the median, and the 25% and the 75%
per capita (a), annual CO2 emissions (b), annual CO2 emissions per capita (c) and quartiles; whiskers show 1.5 times the interquartile range. For details on data
carbon productivity (d). e, f, Model results for regional emissions reductions sources and regional aggregates, see Methods and Extended Data Table 1.
Low-income countries with low carbon and energy productivities tend carbon prices in 2030 ranging from nearly zero to more than US$100 per
to reduce emissions by a larger fraction and carry relatively higher tCO2 (refs. 15,21–23). This implies more equitable burden sharing, but
mitigation costs than economies with high carbon and energy produc- fragmented policy implementation compromises cost efficiency22–25.
tivities14,15. In particular, Asian countries show systematically higher In addition, aggregated globally, NDCs fall short of cost-effective emis-
relative mitigation costs and stronger relative emissions reductions sions reductions required in 2030 to achieve the long-term PA climate
than Organisation for Economic Co-operation and Development targets15,26,27.
(OECD) countries. The aggregate region of Middle East and Africa shows So far, studies on effort sharing have assumed full integration of
higher mitigation costs, but weaker emissions reduction rates because an international cap-and-trade system and varied permit allocations,
of high energy and low carbon intensity. Therefore, without transfers, which either results in regressive outcomes or implies large transfers.
cost-efficient international climate policies tend to cause regressive However, in these studies, the permit allocation does not affect the
income effects that deepen economic inequality. cost-optimal energy and land-use transformations. The evaluation of
The Kyoto Protocol negotiations in 1998 inspired research into cost real-world fragmented policy regimes such as the NDCs shows that
reductions by trading emissions permits16. Various subsequent stud- they fall short of required emissions reductions at relatively high
ies investigated the trade-off between fair effort sharing and national global costs that are not shared equitably15. This study investigates
sovereignty of a broad range of initial permit allocations derived from the trade-off between cost efficiency and sovereignty by exploring
different equity and sovereignty principles17–19. In cap-and-trade sys- the policy options between idealized cap-and-trade systems and frag-
tems, permit allocations following the equal per-capita allocation mented systems of real-world policies while pursuing the PA to limit
principle imply international transfers2–6 that can reach US$6 trillion warming to well below 2 °C above pre-industrial levels with equitable
present value globally until 21004. The equal-effort-sharing criterion effort sharing.
can require even larger transfers to avoid regressive income effects2.
In general, changes in permit allocation do not affect the modelled
energy and land-use sector transformation because the carbon price REMIND–MAgPIE and mitigation policies
does not change in a globally integrated cap-and-trade system if a given We develop a policy analysis framework that consistently varies
number of permits is allocated differently20. regional policy strength and international transfers in scenarios that
The PA, with its system of nationally determined contributions comply with the equal-effort-sharing principle and the well below 2 °C
(NDCs), moved regionally fragmented climate policies to the centre target. The policy framework is evaluated using the long-term multi-
of analysis. The policy ambition of the NDCs corresponds to regional regional integrated assessment model (IAM) REMIND–MAgPIE, which

a 3,000 b
Carbon pricing
Uniform
Cost efficiency
Carbon price in 2030 (US$ per tCO2, log scale)

Differentiated √ √
1,000 Sovereignty √ √
Equity/fairness √ √
15,000 Region
Reforming economies
300
Middle East and North Africa
(billion US$, NPV 2020–2100)

Sub-Saharan Africa
Cumulative income loss

10,000 Other Asia
India
100
China
Latin America
5,000 Canada, Australia, New Zealand
Japan
30 United States
Other Europe
EU
0
EU
pe
an
nd
ca
ina
ia
sia
hA a
a
ies
e
es
ted
t tr iform
h t or m
r
r
ic
fric
rag
Ind
sfe
sfe
ta t
er i
p
uro
ala
rA
om
d N n Afr
Ch
Ja
tia
ave
wit Unif
an
ran
Am
dS
Ze
he
rE
Un
on
en
ara
ort
Ot
ite
rld
he
ec
w
fer
tin
ou
Ne
ah
Un
Ot
Wo
Dif
La
ng
h
b-S
lia,
wit
an
mi
tra
for
Su
st
Ea
us
Re
,A
le
dd
da
Mi
na
Ca
Fig. 2 | Carbon pricing and regional effort. a, Regional CO2 prices in 2030 on a which two out of three criteria are met in each scenario. The differences
log scale. The world average depicts the global average price weighted with the between components of the uniform pricing schemes represent regional
regional CO2 emissions in 2030. b, Total mitigation effort including discounted transfers, showing that the six regions at the bottom are donors and the six
income losses in NPV and eventual transfers compared with a no-policy regions at the top are recipients.
baseline for 2020–2100; the discount rate is 5% yr−1. The table summarizes
links the regional model of investment and development REMIND with mitigation costs by US$2.6 trillion or 21%. Owing to the nonlinearly
the model of agricultural production and its impact on the environment shaped regional mitigation cost functions32, the relative increase is
MAgPIE67. This model has been used to analyse transition pathways28, much lower than the quadrupling of the average carbon price.
effort-sharing schemes4 and NDCs21,25. REMIND–MAgPIE integrates
the macroeconomy with the land use and the energy sector in a gen-
eral equilibrium framework including energy and food trade29. The Deriving a trade-off curve
baseline scenario quantifies a middle-of-the-road scenario (Shared Each of the three corner solutions leads to extreme outcomes in
Socioeconomic Pathway 2 (SSP2)) with medium economic growth and regional mitigation costs, transfers or carbon price differentiation.
partial income convergence (Extended Data Fig. 3)29,30. For instance, Therefore, we explore the trade-off between sovereignty and cost
measured in market exchange rates, US per-capita income in 2020 efficiency by gradually compressing the carbon price spread, moving
is 28 times that of India declining to 13 by 2050 and 5 by 2100. In this from the no-transfer case with differentiated carbon prices towards
study, a carbon budget of 1,300 GtCO2 for the period 2011–2100 is the uniform carbon price case with transfers. We apply an exponential
assumed31. Short-term policies derived from NDCs are applied in 2020; function to adjust pairs of regional prices (pi/pj)α of the no-transfer
carbon prices thereafter increase by 5% yr−1 until 2060 and continue case for regions i and j. The compression parameter α is varied
growing linearly thereafter. The flattened time profile limits overshoot between zero (uniform prices case) and one (differentiated prices
of the carbon budget. Regional variations in climate policy strength case). The compressed carbon price set is used in REMIND–MAgPIE
are implemented by varying carbon prices; international transfers and jointly rescaled to meet the carbon budget, which leads to vari-
are implemented as direct payments. Benefits from avoided climate ations of global mitigation costs and global residual transfers. This
damages and adaptation costs are not considered in these scenarios approach does not necessarily determine the frontier of smallest
(see Methods). efficiency losses for varying transfer volumes, but derives a trade-off
curve that is economically feasible and complies with the PA climate
objective and the equal-effort-sharing criterion (see Methods for
Corner solutions of a policy trilemma detail).
Figure 2 shows that only two of the three criteria of cost efficiency, fair-
ness and sovereignty can be fulfilled simultaneously. The cost-efficient
policy with a uniform carbon price (US$56 per tCO2 in 2030) but no Relaxing uniform carbon pricing policies
transfers leads to mitigation costs of US$12.6 trillion that are distrib- The resulting trade-off curve is strongly nonlinear (Fig. 3a). Starting
uted regressively across regions (0.30% relative income loss for the at the solution with uniform carbon prices, the steep drop in trans-
European Union (EU) and 3.4% for India). A total of US$4.4 trillion in fers follows directly from the cost-efficient solution that requires
international transfers is required to neutralize the regressive income equalization of marginal abatement costs. If non-OECD economies
effects and achieve equal effort sharing. Alternatively, achieving equal lower their carbon taxes, their mitigation costs decrease by an amount
effort sharing without transfers requires regionally differentiated car- slightly smaller than the cost increase in OECD countries that need
bon prices. The ratio between the highest and lowest price is 130. The to impose higher carbon taxes to balance the global carbon budget.
global average carbon price weighted with regional emissions increases Hence, relatively large transfer reductions cause only small inefficien-
to US$225 per tCO2, with a weighted standard deviation of US$400 per cies, if the carbon tax spread remains sufficiently small (Methods,
tCO2. These deviations from uniform carbon pricing increase global Extended Data Fig. 1).

Article
a
56;0
Uniform Global carbon price
with transfer Weighted
4,000
Interregional transfer (NPV billion US$)

average
59;14 63;32
60;19 Standard
3,000 deviation
63;32
68;49
2,000
77;72
88;102
1,000 Differentiated
106;146
133;203
Uniform 170;286
195;339
without transfer 216;383
0 225;402
13,000 14,000 15,000

Global mitigation costs (billion US$, NPV)
b c
2,000 Coal power capacity shutdown exceeds 50%
OECD
Cumulative CO2 emissions 2020–2100 (GtCO2)
Non-OECD
Net emissions
Non-OECD Electricity share exceeds 30%
OECD
OECD
Non-OECD
1,000 Carbon removal using BECCS exceeding 250 Mt yr–1
Emissions and removals OECD
D Non-OECD
EC
n -O CD
No OE Fossil primary energy share falls below 50%
OECD
Emissions deforestation
Non-OECD
Emissions FFI
0
Removals afforestation Total CO2 emissions turn negative
Removals BECCS OECD
Removals DAC Non-OECD
2020 2040 2060 2080 2100

Year threshold is reached
58.9
60.6
63.2
67.3
76.8
87.6
106
136
173
190
215
226
5
.
55
Global average carbon price in 2030 Scenarios

(US$ per tCO2) Differentiated Paritally differentiated Uniform Baseline
Fig. 3 | Sovereignty versus cost-efficiency trade-off and consequences of countries differentiated by emissions sources and carbon removals. FFI, fossil
differentiated carbon prices. a, The trade-off curve, including the three fuel and industry; DAC, direct air capture; BECCS, bioenergy with carbon
corner solutions (marked with red circles). The numbers indicated the 2030 capture and sequestration. c, Different timing of mitigation measures in OECD
global average carbon price and the standard deviation using the regional CO2 and non-OECD regions. ‘Partially differentiated’ is the case with an average
emissions as weights. The costs and transfers are NPVs for the period 2020– carbon price of US$63.3 per tCO2. In some scenarios, the threshold is not
2100. The time path of transfers is shown in Extended Data Fig. 5 and discussed reached before 2100 and, therefore, no marker is shown.
in Methods. b, The cumulative net carbon emissions in OECD and non-OECD
Fig. 5). As the carbon tax spread grows, changes in fossil fuel use become
Relaxing the no-transfer constraint exhausted, while carbon removal is intensified in OECD countries.
Starting at the opposite end of the trade-off curve with full price differ- Hence, untapped abatement potentials in low-carbon-price countries
entiation, the nonlinear shape highlights the effect of limited transfers. are largely offset by costly abatement options in high-income countries
Reducing the price spread by three quarters lessens the global ineffi- as their ability to reduce fossil fuel use is exhausted. This interacts with
ciency by 56%, but requires only 21% of transfers of the uniform price land market distortions: emissions in non-OECD countries increase
case. Owing to the strongly increasing mitigation costs, the marginal due to land-use extensification by deforestation to export bioenergy,
and total costs for OECD countries decline rapidly as their emissions whereas OECD countries reduce agricultural land for afforestation
reductions are relaxed, whereas increments of non-OCED countries while importing biomass that is used in the energy sector combined
are smaller, as their emissions constraints need to be tightened. Hence, with carbon capture and storage (BECCS)8,9. Therefore, market distor-
starting from the solution of full price differentiation makes transfers tions caused by regional policy differentiation risk detrimental impacts
highly effective with respect to reducing inefficiency while maintaining on environmental sustainability. See also Extended Data Figs. 5, 6.
the equal-effort-sharing criterion. The efficiency–sovereignty trade-off also interacts with the timing
of mitigation measures across regions. Figure 3c shows for selected
indicators of the energy sector transformation the year a threshold is
Unintended effects of distorted markets reached. Under uniform carbon pricing, mitigation measures proceed
The solution to the efficiency–sovereignty trade-off has broader impli- at a similar speed in different regions33. For example, BECCS deploy-
cations. Increasingly differentiated carbon prices lead to a reallocation ment exceeds 250 MtCO2 yr−1 shortly after 2040 in OECD and non-OECD
of regional emissions and multiple market distortions. The regional regions. Spreading carbon prices leads OECD countries to front-load
reallocation of gross emissions and fossil fuel market distortions are and tighten the timing of measures, whereas non-OECD countries delay
largest for relatively small carbon price spreads (Fig. 3b, Extended Data and stretch the timing. For instance, non-OECD countries exceed the

BECCS threshold nearly 10 years later, whereas OECD countries pass it or inducing higher efficiency losses. Imposing a trade ban on bioenergy
more than 10 years earlier. Reliance on fossil fuels changes as well: even increases mitigation costs in the uniform policy regime, while eliminat-
at moderate carbon price differentiation, fossil fuel use (including coal) ing carbon leakage via bioenergy trade and substantially reducing the
in some developing countries peaks only in 2035, while OECD countries economic efficiency loss (Extended Data Table 2, Extended Data Fig. 7).
phase-out all fossil fuels more rapidly. Moreover, the accelerated energy
transition and the amplified carbon removal deployment in OECD
countries demand huge energy-sector investments that increasingly Inequality and fair climate policies
crowd-out overall investments in non-OECD countries. Hence, relatively Equal effort sharing could be considered too weak a fairness crite-
low energy prices in non-OECD countries do not necessarily attract rion for two reasons. First, even under PA warming limits, damages
investments to support economic development (Extended Data Fig. 6). are disproportionally more severe in developing countries34,35. Hence,
a more progressive effort-sharing criterion can be justified in a policy
framework based only on mitigation costs36. Second, different socio-
The shape of the trade-off matters economic capabilities and inequality aversion suggest a more progres-
The core finding of this research is the strongly nonlinear trade-off sive effort-sharing scheme for managing commons37. It is also argued
between cost-efficiency and sovereignty in achieving the long-term PA that OECD countries bear greater responsibility due to their historic
climate target in an equitable way. The cost-efficient, yet idealized, pol- emissions. However, for OECD countries, equal effort and historic
icy framework of uniform carbon prices requires international transfers responsibility allocation show only small differences, but allocations
that correspond to 35% of global mitigation costs to avoid unequitable are substantially shifted between non-OECD countries depending on
effort sharing. These huge transfers confirm previous cap-and-trade their fossil fuel endowments4.
studies2–5,10,15,16, but are unlikely to be agreed on internationally. We find
that modest deviations from uniform carbon prices allow to strongly
reduce transfers. Carbon price differentiation, however, causes market Transfers and sovereignty
distortions, sustainability risks and asynchronous timing of mitigation Equitable effort sharing in international climate policy requires a com-
measures. Real-world climate policy agreements, such as the NDCs, promise of the sovereignty–efficiency trade-off. Recent research on
do not reach the emissions reductions necessary to achieve the PA tar- international climate policies and cooperation, for example, by carbon
gets and are internationally fragmented, implying a broad variation of tax harmonization, has made progress towards negotiating coopera-
regional carbon prices. Previous studies have quantified the resulting tion with uniform carbon pricing, but has not addressed issues of equity
inefficiency by comparing fragmented with idealized cap-and-trade and transfers6,11,38. Transfers might infringe on sovereignty, but do not
systems, but relatively large transfers would still be required15,16. necessarily run counter to national welfare. Especially in light of the
This study expands the analysis: if sovereignty is prioritized, the heterogeneity between countries, transfers facilitate agreement in
implementation of the PA targets with equitable effort sharing would international environmental policies that improve the welfare of all
require higher and more differentiated carbon prices. However, mod- nations by increasing commitments and reducing free-rider behav-
erate transfers allow for convergence of regional carbon prices and iour39–41. Following previous research on permit allocation principles,
substantial reductions of inefficiencies. Full convergence towards this study refers to sovereignty only as limits on ceding control over
uniform carbon prices leads to only small margins in reducing inef- economic resources by transfers within the regime of the PA4,10,18,19. In
ficiency, but require increasing transfers. The high sensitivity close to a broader perspective, climate change also relates to the sovereignty
the corner solution follows from the optimality condition of uniform of self-governing as well as the sovereignty of territory, if sea-level rise
carbon prices and the nonlinearity of abatement cost functions. These is considered.
features and, thus, the general results are not unique to the modelling
tool used in this study.
The efficiency–sovereignty trade-off and other consequences of New perspective on policy mixes
carbon price differentiation and transfers for sustainability and mitiga- Large carbon price spreads induce market distortions in energy
tion timing inform the discourse about Article 6 of the PA. and land-use transformations, thus undermining climate policy
cost-efficiency, effectiveness and broader sustainability targets. Addi-
tional sector-specific land-use and energy policies could correct associ-
Sensitivity of the trade-off curve ated market distortions, thereby complementing a non-optimal carbon
The distributional challenge, reflected by the trade-off curve, depends pricing regime, as exemplified by the bioenergy trade restriction men-
on various assumptions. As a sensitivity analysis, we varied the assump- tioned above. Other complementary policies are coal phase-out, fossil
tion of socioeconomic drivers of economic inequality. Faster income fuel subsidy reform, forest protection and international technology
convergence that reflects more inclusive growth (we use the SSP1 transfers15,42–46. In the context of differentiated carbon prices, these
demography and economy projections) softens the distributional policies could reduce distributional challenges and attenuate market
challenge by reducing the ratio of global transfers to global mitiga- distortions. Moreover, the inclusion of climate change damages is a
tion costs from 35% to 31%. Furthermore, the global carbon budget is promising future line of research direction47.
important: increasing it to 1,600 GtCO2 reduces the maximum carbon
prices spread to 40, whereas a tighter budget of 950 GtCO2 widens
the spread to 500. The metric chosen to operationalize burden shar- Online content
ing is also crucial. In this analysis, we used relative income reduction Any methods, additional references, Nature Research reporting sum-
for the period 2020–2100, which emphasizes the potential losses of maries, source data, extended data, supplementary information,
developing countries (for example, India). The alternative metric of acknowledgements, peer review information; details of author con-
consumption losses puts stronger emphasis on losses of countries well tributions and competing interests; and statements of data and code
endowed with fossil fuels. Shortening the time horizon of the metric to availability are available at https://doi.org/10.1038/s41586-020-2982-5.
2050 reduces the spread factor to 11. Delaying the deployment of CCS
until 2050 or slowing the maximum annual rate of abandoning fossil
1. Okereke, C. & Coventry, P. Climate justice and the international regime: before, during,
fuel infrastructure to 6% increases minimum mitigation costs and also and after Paris: climate justice and the international regime. Wiley Interdiscip. Rev. Clim.
intensifies the distributional challenge requiring either more transfers Change 7, 834–851 (2016).

Article
2. Tavoni, M. et al. Post-2020 climate agreements in the major economies assessed in the 25. Vrontisi, Z. et al. Enhancing global climate policy ambition towards a 1.5 °C stabilization:
light of global models. Nat. Clim. Change 5,119–126 (2015). a short-term multi-model assessment. Environ. Res. Lett. 13, 044039 (2018).
3. Tavoni, M. et al. The distribution of the major economies’ effort in the Durban Platform 26. Rogelj, J. et al. Paris Agreement climate proposals need a boost to keep warming well
scenarios. Clim. Change Econ. 04, 1340009 (2013). below 2 °C. Nature 534, 631–639 (2016).
4. Leimbach, M. & Giannousakis, A. Burden sharing of climate change mitigation: global and 27. The Emissions Gap Report 2019 (UNEP, 2019).
regional challenges under shared socio-economic pathways. Climatic Change 155, 28. Luderer, G. et al. Residual fossil CO2 emissions in 1.5–2 °C pathways. Nat. Clim. Change 8,
273–291 (2019). 626–633 (2018).
5. Lüken, M. et al. The role of technological availability for the distributive impacts of 29. Kriegler, E. et al. Fossil-fueled development (SSP5): an energy and resource intensive
climate change mitigation policy. Energy Policy 39, 6030–6039 (2011). scenario for the 21st century. Glob. Environ. Change 42, 297–315 (2017).
6. Aldy, J. E., Krupnick, A. J., Newell, R. G., Parry, I. W. H. & Pizer, W. A. Designing climate 30. Dellink, R., Chateau, J., Lanzi, E. & Magné, B. Long-term economic growth projections in
mitigation policy. J. Econ. Lit. 48, 903–934 (2010). the shared socioeconomic pathways. Glob. Environ. Change 42, 200–214 (2017).
7. Victor, V. The Collapse of the Kyoto Protocol and the Struggle to Slow Global Warming 31. Rogelj, J., Forster, P. M., Kriegler, E., Smith, C. J. & Séférian, R. Estimating and tracking the
(Princeton Univ. Press, 2001). remaining carbon budget for stringent climate targets. Nature 571, 335–342 (2019);
8. González-Eguino, M., Capellán-Pérez, I., Arto, I., Ansuategi, A. & Markandya, A. Industrial correction 580, E4 (2020).
and terrestrial carbon leakage under climate policy fragmentation. Clim. Policy 17, 32. Luderer, G. et al. Economic mitigation challenges: how further delay closes the door for
S148–S169 (2017). achieving climate targets. Environ. Res. Lett. 8, 034033 (2013).
9. Otto, S. A. C. et al. Impact of fragmented emission reduction regimes on the energy 33. Bauer, N. et al. Shared socio-economic pathways of the energy sector—quantifying the
market and on CO2 emissions related to land use: a case study with China and the narratives. Glob. Environ. Change 42, 316–330 (2017).
European Union as first movers. Technol. Forecast. Soc. Change 90, 220–229 (2015). 34. Diffenbaugh, N. S. & Burke, M. Global warming has increased global economic inequality.
10. Böhringer, C. & Welsch, H. Burden sharing in a greenhouse: egalitarianism and Proc. Natl Acad. Sci. USA 116, 9808–9813 (2019).
sovereignty reconciled. Appl. Econ. 38, 981–996 (2006). 35. Burke, M., Hsiang, S. M. & Miguel, E. Global non-linear effect of temperature on economic
11. Nordhaus, W. Climate clubs: overcoming free-riding in international climate policy. production. Nature 527, 235–239 (2015).
Am. Econ. Rev. 105, 1339–1370 (2015). 36. De Cian, E., Hof, A. F., Marangoni, G., Tavoni, M. & van Vuuren, D. P. Alleviating inequality
12. Csereklyei, Z. & Stern, D. I. Global energy use: decoupling or convergence? Energy Econ. in climate policy costs: an integrated perspective on mitigation, damage and adaptation.
51, 633–641 (2015). Environ. Res. Lett. 11, 074015 (2016).
13. International Comparison Program Purchasing Power Parities and Real Expenditures of 37. Evans, D. J. & Sezer, H. Social discount rates for member countries of the European Union.
World Economies: Summary of Results and Findings of the 2011 International Comparison J. Econ. Stud. 32, 47–59 (2005).
Program (World Bank, 2014). 38. Weitzman, M. L. Can negotiating a uniform carbon price help to internalize the global
14. Stern, D. I., Pezzey, J. C. V. & Lambie, N. R. Where in the world is it cheapest to cut carbon warming externality? J. Assoc. Environ. Resour. Econ. 1, 29–49 (2014).
emissions? Aust. J. Agric. Resour. Econ. 56, 315–331 (2012). 39. Barrett, S. in Conflicts and Cooperation in Managing Environmental Resources (ed. Pethig,
15. Fujimori, S. et al. Will international emissions trading help achieve the objectives of the R.) 11–35 (Springer, 1991).
Paris Agreement? Environ. Res. Lett. 11, 104001 (2016). 40. Carraro, C. & Siniscalco, D. in The Economics of Sustainable Development (eds Goldin, I. &
16. Weyant, J. P. & Hill, J. Introduction and overview. The costs of the Kyoto Protocol: a Winters, L. A.) 264–288 (Cambridge Univ. Press, 1995).
multi-model evaluation. Energy J. (Spec. Issue) vii–xliv (1999). 41. Kornek, U. & Edenhofer, O. The strategic dimension of financing global public goods.
17. Zhou, P. & Wang, M. Carbon dioxide emissions allocation: a review. Ecol. Econ. 125, 47–59 Eur. Econ. Rev. 127, 103423 (2020).
(2016). 42. Lazarus, M. & van Asselt, H. Fossil fuel supply and climate policy: exploring the road less
18. van den Berg, N. J. et al. Implications of various effort-sharing approaches for national taken. Climatic Change 150, 1–13 (2018).
carbon budgets and emission pathways. Climatic Change https://doi.org/10.1007/s10584- 43. Canadell, J. G. & Raupach, M. R. Managing forests for climate change mitigation. Science
019-02368-y (2019). 320, 1456–1457 (2008).
19. Höhne, N., den Elzen, M. & Escalante, D. Regional GHG reduction targets based on effort 44. Glachant, M. & Dechezleprêtre, A. What role for climate negotiations on technology
sharing: a comparison of studies. Clim. Policy 14, 122–147 (2014). transfer? Clim. Policy 17, 962–981 (2017).
20. Manne, A. S. & Stephan, G. Global climate change and the equity–efficiency puzzle. 45. Schultes, A. et al. Optimal international technology cooperation for the low-carbon
Energy 30, 2525–2536 (2005). transformation. Clim. Policy 18, 1165–1176 (2018).
21. Kriegler, E. et al. Making or breaking climate targets: the AMPERE study on staged 46. Paroussos, L. et al. Climate clubs and the macro-economic benefits of international
accession scenarios for climate policy. Technol. Forecast. Soc. Change 90, 24–44 (2015). cooperation on climate policy. Nat. Clim. Change 9, 542–546 (2019).
22. Aldy, J. et al. Economic tools to promote transparency and comparability in the Paris 47. Chichilnisky, G. & Heal, G. Who should abate carbon emissions? An international
Agreement. Nat. Clim. Change 6, 1000–1004 (2016). viewpoint. Econ. Lett. 44, 443–449 (1994).
23. Jacoby, H. D., Chen, Y.-H. H. & Flannery, B. P. Informing transparency in the Paris
Agreement: the role of economic models. Clim. Policy 17, 873–890 (2017). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
24. Vandyck, T., Keramidas, K., Saveyn, B., Kitous, A. & Vrontisi, Z. A global stocktake of the published maps and institutional affiliations.
Paris pledges: implications for energy systems and economy. Glob. Environ. Change 41,
46–63 (2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods sovereignty–equity trade-off mentioned in the main text2,4,5,10. Fulfilling
the separability property requires the assumption that the transfers do
Modelling framework not change the carbon intensity of the goods bundles that are produced
The analysis is based on an integrated model framework that couples a in donor countries and transferred to recipient countries. Hence, given
model of the energy sector and macroeconomic growth with a land-use income reductions ΔYr in region r compared with its baseline income
sector model. It covers all GHG emissions including CO2 emissions from Yr, the equal-effort-sharing criterion is implemented by choosing a set
energy conversion, industry and land-use change, which are limited by of transfers Tr such that the equal-effort-sharing criterion is fulfilled,
the global carbon budget. The other GHGs are priced based on conver- which equals the global relative income loss on the right-hand side:
sion factors assumed to equal the global warming potentials that the
ΔYr − Tr ∑ r ΔYr
Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment = , ∀ r.
Yr ∑ r Yr
Report (AR5) documented.
The REMIND model and the MAgPIE model divide the world into For the case of regionally differentiated carbon taxes without trans-
12 regions (Extended Data Fig. 8). Both models are coupled using a fers (that is, Tr = 0), the carbon price trajectories are varied (1) for each
soft-link approach29. For REMIND, version 2.0 is applied, which is avail- region individually to fulfil the equal-effort-sharing criterion for all
able open access. For MAgPIE, version 4.1 is used. Both models are open regions and (2) the global set of carbon prices is adjusted to comply
access on GitHub (see Code availability). with the global carbon budget. These two adjustments are imple-
The REMIND model is solved by finding the international prices for mented sequentially. First, for each region, the carbon price trajec-
traded goods and energy carriers using a fixed-point iteration based tory is increased (decreased), if the regional effort is smaller (larger)
on a Nash-type algorithm. It is known that this algorithm replicates than the global effort, measured as the mitigation costs relative to Yr.
the Negishi approach, but is computationally more efficient45. The Second, this updated set of carbon prices is shifted upwards (down-
CO2 prices are adjusted between each iteration to comply with the wards), if the endogenous cumulative emissions are below (above) the
prescribed global carbon budget. The precision for the carbon budget is exogenously assumed carbon budget. Both adjustments are gradual
better than 1%. For the scenarios in this study, the international spillover in each iteration step so that the fixed-point iteration converges in a
of technology learning is not internalized45. computationally efficient and robust.
The MAgPIE model is run subject to regional food demands as well For the derivation of the trade-off curve, the starting point is the set
as regional carbon and non-CO2 GHG emission prices and regional of regional carbon prices p ∼ with full differentiation and no transfers.
r
biomass feedstock production requirements. The present version The derivation requires a series of model runs with adjusted carbon
includes the carbon removal option of afforestation. It is assumed that prices and derivation of residual transfers. Each model run comprises
the carbon removals are remunerated with only half the CO2 price in three steps (see also Extended Data Fig. 2).
each region. This discount reflects precautions related to monitoring, First, the spread of carbon prices in 2030 is compressed by the fol-
reporting and verification as well as issues related to the permanency lowing function:
of the integrity of the carbon stores.
∼ α
Net present values (NPVs) are used for monetary values that are ∼min  pr 
pr = p
aggregates of annual values over a time horizon, such as mitigation  ∼min 
p 
costs, income and transfers. For the discounting of these future values,
we apply a 5% annual discount rate. For each region, the carbon price is normalized to the smallest
regional carbon price p ∼min. This is the carbon price spread shown in
Implementation of uniform and differentiated carbon policies Fig. 2a. The exponent α is varied between zero and one, which com-
In this study, we prescribe carbon prices as taxes that develop over time presses the carbon price spread for each region. The multiplication of
and that are recycled on a lump-sum basis domestically. Over time, the the compressed ratio (p ∼ /p
∼min)α with p∼min gives the updated carbon
r
tax paths start after 2020 and increase with an annual growth rate of price pr. Applying this formula to all regional carbon prices results in
5% until 2060. Afterwards, the tax paths continue to increase linearly the compressed set of carbon prices. Varying α leads to different degree
until 2100. The flattening of the carbon price trajectories is motivated of compression. When α = 1, the compression is neutral, thus pr = p ∼.
r
by the potentially huge carbon budget overshoots that result if the When α = 0, the compression is complete and the result is the uniform
exponential growth is continued by the end of the century48,49. carbon price case.
The carbon budgets are not implemented directly as quantity targets Second, the set of compressed regional carbon prices is jointly
to derive the resulting carbon prices. This would assume a globally rescaled in a new REMIND–MAgPIE run so that the global cumulative
integrated emissions permit market. The implementation of uniform CO2 emissions comply with the exogenous carbon budget. This step is
carbon tax paths and the adjustments to meet the carbon budget is an necessary because the set of regional carbon prices from the first step
approximation to the cost-optimal solution. This approach allows the does not comply with the carbon budget. The adjustment of the carbon
implementation of the uniform carbon price policy as well as the poli- price set maintains the price spread, but adjusts the overall level so that
cies with differentiated carbon prices, which are of particular interest in the CO2 emissions comply with the carbon budget.
this study. The quality of the approximation to the social optimal result Finally, the residual transfers are computed based on this result so
becomes weaker if the economic value of the flexibility to temporarily that the equal-effort-sharing criterion is fulfilled.
overshoot the carbon budget increases. The trade-off curve derived with this methodology does not rep-
The case with uniform carbon prices is implemented by assum- resent a frontier in the space of global transfers and mitigation costs.
ing a parameterized shape of the carbon price trajectory that starts Such a frontier associates to every global transfer volume the mini-
after 2020. This carbon tax path is shifted up until the endogenous mal additional global mitigation costs, while observing the carbon
cumulative CO2 emissions up until 2100 are sufficiently close (<1%) budget and the equal-effort-sharing criterion. It cannot be excluded
to the exogenously assumed carbon budget. The transfers necessary that an improved method allows the identification of combinations
to achieve the equal-effort-sharing criterion are computed ex-post. that allow a stronger reduction of transfers for each increase of global
This approach assumes that international transfers do not change the mitigation costs; that is, points below the trade-off curve in Fig. 3a
global carbon price and the emissions in each region. This is fulfilled cannot be excluded. Although such a frontier exists, its numerical
by the REMIND–MAgPIE model and is the basis of analysis of varying derivation is computationally very demanding. The reason is that
international carbon permit allocations, that is, the analysis of the for each transfer level, all regional price trajectories (here 12) need
Article
to be varied independently to find the minimal additional mitigation defined in the 2010 Cancun Agreement, it still is not a precise com-
costs and simultaneously the carbon budget, the equal-effort-sharing parison owing to open questions regarding definitions, accounting,
criterion as well as the global transfer volume are held constant. additionality, finance origin and the objectives of the available funds51,52.
The compression function avoids the independent price variations and The controversies about these open questions make it very difficult
is therefore computationally more efficient, but only approximates to give a concluding statement on whether OECD countries are on
the frontier. Alternative compression functions perform differently. track to fulfil this pledge by 2020, but these controversies all the more
We identified the exponential function performing best. Extended strengthen the argument that achieving the high rates of transfers
Data Fig. 7d shows the comparison of the exponential with a linear required in future decades with uniform carbon prices are a serious
compression function. The linear function is inferior because the result- hurdle for implementing such an international policy architecture. To
ing trade-off curve is located above the trade-off curve derived with be sure: the numbers shown in Extended Data Fig. 5 refer only to trans-
the exponential function. This means that for every transfer level, the fers to equate the burden sharing with respect to mitigation costs, so
exponential compression function derives a set of regional carbon financing for adaptation and loss and damage (itself a very thorny and
prices that leads to smaller additional mitigation costs. controversially debated topic)53 would need to come on top. As another
The differentiation of the carbon prices to fit with the equal-effort- point of reference, OECD countries since 1970 also have a target of con-
sharing criterion could call into question the uniqueness of the price tributing 0.7% of their GDP to official development assistance; though
vector that solves the problem. Given that the solution of uniform here again, the accounting and track record is ambiguous, and further
carbon prices is unique, we do not identify a convincing argument that complicated by the overlap with climate finance52.
the differentiated carbon price vector fulfilling the equal-effort-sharing
criterion is not unique. The numerical solution behaviour and the Historic and scenario data
sensitivity analysis did support this finding. Advanced future research Figure 1a–d uses data from the World Bank’s World Development Indi-
into the analytics of the uniqueness of differentiated price vectors cators and BP’s World Energy Statistics54,55.
could lead to new and more general insights. Figure 1e, f uses data provided by the IPCC’s latest reports in the
scenario databases. These databases in turn rely on scenarios that
Analytical treatment of trade-off curve have been derived in various projects that applied a broad set of IAM
The nonlinearity of the trade-off curve follows directly from the neces- models under different boundary conditions and policies. Extended
sary condition for the optimality of the uniform price solution and the Data Table 1 summarizes the selection of scenarios that have been used
shape of the marginal abatement cost (MAC) functions that are consistent for the analysis. It is worth noting that the databases include more sce-
with standard assumptions in models of environmental economics. The narios for these projects, but these are usually only sensitivity cases.
basic argument is that around the cost-optimal solution, the distribu- Including the sensitivity cases could lead to biased results that arise
tional aspect (orange rectangles in Extended Data Fig. 1) is substantially only from the different project designs and their number of sensitivity
larger than the efficiency aspect (red triangles in Extended Data Fig. 1). cases. The scenarios were provided by the following projects: EMF2756,
The REMIND–MAgPIE model does not explicitly comprise MACs, AMPERE57, RoSE58, ADVANCE28, SSP33 CD-LINKS59,60, EMF3361 and GEA62.
but they can be derived using a series of model runs that vary carbon The relative mitigation costs are derived as differences between a
prices to derive carbon emissions. That the conditions for the implicit policy case and the corresponding no policy baseline case. The differ-
MACs are fulfilled has been published in the literature using REMIND– ent models shown in Fig. 1f report different metrics of mitigation costs.
MAgPIE and similar investigations have been performed with other Where possible, we used differences in GDP. If this indicator was not
models as well32,50. available, we used additional energy system costs or the area under
the marginal abatement cost curve63.
Research note on transfers
The transfers mentioned and discussed in the main text are computed
as NPVs over the time horizon 2020–2100. There is no explicit state- Data availability
ment about the timing of the transfer payments. This requires addi- Scenario data have been uploaded in Zenodo with the identifier https://
tional assumptions. Here we investigate the transfers over time in the doi.org/10.5281/zenodo.4010426. Source data are provided with this
uniform carbon price case assuming that all regions experience the paper.
same relative effort in each period. Extended Data Fig. 5 summarizes
the results for the OECD and the non-OECD regions. Note that differ-
ences in the percentage figures in both regions are due to differences Code availability
in gross domestic product (GDP) development. The model codes of REMIND (identifier: ab2c995116e7fb402f6d-
The transfers from the OECD region increase rapidly over time and d0183724496373af996e) and MAgPIE (identifier: 950bc7a08fd0e6c8f-
stabilize at around 1% GDP in 2050. The reverse flow in 2020 is due to the 790c1399c7837133233e2fc) are open source (https://github.com/
NDCs that lead to higher mitigation costs in the OECD region than in the remindmodel/remind and https://github.com/magpiemodel/magpie).
non-OECD region. This reverse financial flow considers only the actual
emissions limitations according to the NDCs, but does not account for 48. Obersteiner, M. et al. How to spend a dwindling greenhouse gas budget. Nat. Clim.
differences in historic emissions. Given the assumption of equal effort Change 8, 7–10 (2018).
49. Realmonte, G. et al. An inter-model assessment of the role of direct air capture in deep
in all periods and regions, the NDCs imply that the region of developing mitigation pathways. Nat. Commun. 10, 3277 (2019).
and emerging countries needs to be the donor. Starting in 2025, the 50. Azar, C., Johansson, D. J. A. & Mattsson, N. Meeting global temperature targets—the role
uniform carbon pricing policy is applied, which leads to the expected of bioenergy with carbon capture and storage. Environ. Res. Lett. 8, 034004 (2013).
51. Lundsgaarde, E., Dupuy, K. & Persson, A. Coordination Challenges in Climate Finance
direction transfers from the OECD region to the non-OECD region. (Danish Institute for International Studies, 2018); https://www.econstor.eu/
For orientation purposes, we also added the line of 0.22% to the bitstream/10419/204624/1/1042180393.pdf
Extended Data Fig. 5, which is the percentage share of GDP in 2020 con- 52. Motty, M. & Ackom, E. K. in Climate Action (eds Leal Filho, W. et al.) 1–11 (Springer
International Publishing, 2019); https://doi.org/10.1007/978-3-319-71063-1_104-2
sidering the US$100 billion target by 2020 pledged by donor countries 53. Sharma, A. Precaution and post-caution in the Paris Agreement: adaptation, loss and
after negotiations over the course of various Conferences of the Parties damage and finance. Clim. Policy 17, 33–47 (2017).
under the United Nations Framework Convention on Climate Change. 54. BP Statistical Review of World Energy 2020 (BP, 2020); https://www.bp.com/en/global/
corporate/energy-economics/statistical-review-of-world-energy/downloads.html
The US$100 billion pledge was first introduced in the Copenhagen 55. World Development Indicators, DataBank (World Bank, 2020); https://databank.
Accord of 2009. Although this pledge was somewhat more clearly worldbank.org/source/world-development-indicators#
56. Kriegler, E. et al. The role of technology for achieving climate policy objectives: overview 67. Bauer, N. et al. Bio-energy and CO2 emission reductions: an integrated land-use and
of the EMF 27 study on global technology and climate policy strategies. Climatic Change energy sector perspective. Clim. Change https://doi.org/10.1007/s10584-020-02895-z
123, 353–367 (2014). (2020).
57. Riahi, K. et al. Locked into Copenhagen pledges—implications of short-term emission
targets for the cost and feasibility of long-term climate goals. Technol. Forecast. Soc.
Change 90, 8–23 (2015). Acknowledgements We acknowledge funding from the German Federal Ministry of
58. Kriegler, E. et al. Will economic growth and fossil fuel scarcity help or hinder climate Education and Research (BMBF) in the Funding Priority ‘Economics of Climate Change’
stabilization? Climatic Change 136, 7–22 (2016). (DIPOL: 01LA1809A). This work was supported by the European Union’s Horizon 2020
59. Roelfsema, M. et al. Taking stock of national climate policies to evaluate implementation research and innovation programme under grant agreement numbers 730403
of the Paris Agreement. Nat. Commun. 11, 2096 (2020). and 821124.
60. McCollum, D. L. et al. Energy investment needs for fulfilling the Paris Agreement and
achieving the Sustainable Development Goals. Nat. Energy 3, 589–599 (2018); correction Author contributions N.B., C.B., D.K. and A.S. developed the policy analysis framework and
3, 699 (2018). designed the experiments. N.B. and A.S. implemented the policy framework. N.B. wrote the
61. Bauer, N. et al. Global energy sector emission reductions and bioenergy use: overview of manuscript (with assistance from C.B. and G.L.). N.B. led the analysis and writing of the
the bioenergy demand phase of the EMF-33 model comparison. Climatic Change https:// manuscript with contributions from all authors.
doi.org/10.1007/s10584-018-2226-y (2018).
62. Riahi, K. et al. in Global Energy Assessment—Toward a Sustainable Future 1203–1306 Competing interests The authors declare no competing interests.
(Cambridge Univ. Press, 2012).
63. Krey, V. et al. in IPCC Climate Change 2014: Mitigation of Climate Change (eds Edenhofer, Additional information
O. et al.) 1281–1328 (Cambridge Univ. Press, 2014). Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
64. Huppmann, D., Rogelj, J., Krey, V., Kriegler, E. & Riahi, K. A new scenario resource for 2982-5.
integrated 1.5 °C research. Nat. Clim. Change 8, 1027–1030 (2018) Correspondence and requests for materials should be addressed to N.B.
65. KC, S. & Lutz, W. The human core of the shared socioeconomic pathways: population Peer review information Nature thanks Hancheng Dai, Ritu Mathur, Wei Peng and the other,
scenarios by age, sex and level of education for all countries to 2100. Glob. Environ. anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer
Change 42, 181–192 (2017). reports are available.
66. South, A. rworldmap: a new R package for mapping global data. The R Journal 3, 35–43 Reprints and permissions information is available at http://www.nature.com/reprints.
(2011).
Article
Extended Data Fig. 1 | Graphical illustration of the distributional effects equal costs; this case is not depicted explicitly. The case ‘Hybrid’ differentiates
between advanced economy A and developing economy B for different carbon prices to reduce T. The change of global mitigation cost is only the
policy frameworks characterized by different marginal abatement cost difference between the regional mitigation costs ΔAC a − ΔACb (indicated by the
functions fA and fB . The case ‘Uniform price w/o transfers’ with carbon prices red triangles), whereas the changes of transfers are represented by the orange
equal to p in both regions implies different mitigation costs. In the case rectangles pΔR. As long as the differentiation of prices is relatively small, the
‘Uniform price w/ transfers’ these differences are neutralized by transfers T. decline of transfers exceeds the increase in global mitigation costs.
Alternatively, in the case ‘No transfer’ the differentiation of policies leads to
Extended Data Fig. 2 | Illustration of the exponential compression regions and the light grey line highlights the compression for Latin America
function. The x axis shows the 2030 carbon prices in the full differentiation and the EU. We note that the set of compressed carbon prices is scaled to
case (see Fig. 2a). The example applies the parameter α = 0.5 to the compression comply with the global carbon budget (that is, the relative differences of the
∼min(p
function pr = p ∼ /p
∼min )α that has been introduced in the Methods. The y axis carbon price spread remain constant).
r
shows the carbon prices after compression. The figure shows a subset of
Article
Extended Data Fig. 3 | Socioeconomic drivers and CO2 emissions in the c, d, Energy and industry CO2 emissions (c) as well as total CO2 emissions (d).
no-policy baseline scenario. a, b, Population (a) and economic growth (b) See ‘Data availability’ section for more details.
from history 1990–2015 and in the SSP2 baseline scenario 2015–210030,65.
Extended Data Fig. 4 | Time path of transfers in the default case with
uniform carbon prices. The transfers are expressed as percentages of GDP in
the OECD and the non-OECD regions. The dashed line serves as a point of
orientation. It represents the share of the US$100 billion relative to the OECD’s
GDP in 2020.
Article
Extended Data Fig. 5 | Effect of carbon price differentiation on primary and energy use increases. Second, OECD countries mostly reduce residual oil and
final energy use. a, b, Changes in the global energy mixes distinguished by gas consumption, but non-OECD countries mostly increase the use of coal;
energy carriers and regions. The figure depicts differences compared with the therefore, the total consumption of coal increases, whereas the global use of oil
uniform carbon tax case for the 1,300 GtCO2 carbon budget. Primary energy (a) and gas decreases. Third, the total use of biomass increases due to increasing
is measured according to the direct equivalence principle; final energy (b) is demand in OCED countries. Fourth, OECD countries accelerate modernization
measured as delivered to final consumer. This means that fossil fuels, biomass of final energy use by mainly reducing the use of liquids and gases, but
and geothermal energy are measured in primary energy input, whereas increasing electricity and hydrogen. Finally, non-OECD countries delay
renewables (hydro, wind and solar) as well as nuclear energy are measured by modernization of energy use by mainly increasing the use of solids, liquids and
their electricity output. Notable results are as follows. First, the total amount of gases with corresponding implications for air pollution and so on.
Extended Data Fig. 6 | Effect of carbon price differentiation on land-use exported to OECD countries. Moreover, OECD countries increase investments
change and investments. a, b, Changes in global land use (a) and investment in the energy sector substantially to facilitate the transition to low-carbon
and regions (b). The figure depicts differences compared with the uniform technologies and to invest into carbon-removal technologies (which are
carbon tax case for the 1,300 GtCO2 carbon budget. The two regions show counted as part of the energy sector). The higher OECD investments crowd-out
opposite changes in the variables; for example, OECD countries convert the macroeconomic investments in OECD countries and energy sector
agricultural land into forests to remove carbon by afforestation, whereas investments in non-OECD countries.
non-OECD countries convert forests into cropland to grow biomass that is
Article
Extended Data Fig. 7 | Sensitivity analysis of the trade-off curve. a, The technologies that rely on underground geological storage of CO2 (that is, CCS
sensitivity of the ban on bioenergy trade. b, The sensitivity with respect to the including direct air capture). d, The application of the linear compression
carbon budget. c, The variation of the maximum annual retirement rate of function; in this sensitivity analysis the case of uniform taxes and the
fossil-fuelled infrastructure from 9% to 6% and the delayed availability of no-transfer case are identical.
Extended Data Fig. 8 | Regional aggregates used in REMIND and MAgPIE. Regions and countries belonging to the OECD region are coloured in blue tones.
See ‘Data availability’ section for more details. World map based on rworldmap package66.
Article
Extended Data Table 1 | Overview of data sources for model comparisons
Scenario data in Fig. 1e, f were taken from the IPCC databases for the Fifth Assessment Report (AR5) and the Special Report on 1.5 °C warming (SR15)64. The carbon budgets are computed for the
period 2011–2100. Figure 1e, f shows relative differences between the ‘Policy case’ and the ‘Base case’.
Extended Data Table 2 | Overview of selected sensitivity cases
The mitigation costs and transfers are NPVs applying a 5% annual discount rate. The inefficiency is the difference between the mitigation costs in the differentiated and the uniform case. The
bold numbers are quoted in the text. The carbon prices in the default case are shown in Fig. 2a. The income growth sensitivity case uses per-capita GDP and population projections of SSP1 (but
the demands for final energy and food are the same as in the default case, which relies on SSP2). Note that in case of faster income growth and convergence (right-hand column), the carbon
price in the uniform case as well as the absolute mitigation costs increase, but the ratio of transfers to mitigation costs decreases. If the metric of consumption losses is used to measure effort,
the losses of fossil fuel-rich countries are more severe in the uniform price case and, consequently, in the differentiated case the carbon prices in these countries are lower.
Article
Clustered versus catastrophic global

vertebrate declines
https://doi.org/10.1038/s41586-020-2920-6 Brian Leung1,2 ✉, Anna L. Hargreaves1, Dan A. Greenberg3, Brian McGill4,5, Maria Dornelas6
& Robin Freeman7

Recent analyses have reported catastrophic global declines in vertebrate populations1,2.
Published online: 18 November 2020
However, the distillation of many trends into a global mean index obscures the
Check for updates
variation that can inform conservation measures and can be sensitive to analytical
decisions. For example, previous analyses have estimated a mean vertebrate decline
of more than 50% since 1970 (Living Planet Index2). Here we show, however, that this
estimate is driven by less than 3% of vertebrate populations; if these extremely
declining populations are excluded, the global trend switches to an increase. The
sensitivity of global mean trends to outliers suggests that more informative indices
are needed. We propose an alternative approach, which identifies clusters of extreme
decline (or increase) that differ statistically from the majority of population trends. We
show that, of taxonomic–geographic systems in the Living Planet Index, 16 systems
contain clusters of extreme decline (comprising around 1% of populations; these
extreme declines occur disproportionately in larger animals) and 7 contain extreme
increases (around 0.4% of populations). The remaining 98.6% of populations across
all systems showed no mean global trend. However, when analysed separately, three
systems were declining strongly with high certainty (all in the Indo-Pacific region) and
seven were declining strongly but with less certainty (mostly reptile and amphibian
groups). Accounting for extreme clusters fundamentally alters the interpretation of
global vertebrate trends and should be used to help to prioritize conservation efforts.
Rapid global change is threatening species across the globe1. The quan- one population declined by 99%. Even if a second population increased
tification of biodiversity trends is important to assess whether current 50-fold or 393 populations increased by 1% (that is, a large net increase),
investment is slowing or reversing declines, and to identify regions a geometric mean would show a catastrophic 50% decline. Thus, a
and taxa of concern. Although distilling disparate population trends geometric mean decline of 50% could arise from substantial, wide-
into a single global index can focus attention on biodiversity trends2–4, spread loss that is occurring across many populations (we term this the
simple metrics can distort the full picture. ‘catastrophic declines’ hypothesis) or from a few extremely declining
Estimates of global biodiversity trends vary depending on their populations (we term this the ‘clustered declines’ hypothesis). Both
data and mathematical model. The most apocalyptic models gather scenarios involve important conservation issues, but suggest vastly
extensive press coverage, even when based on controversial data (for different underlying problems and require different mitigation strate-
example, ‘biological annihilation’5, which described trend estimates gies14, thus distinguishing between them is of real-world importance.
based largely on expert opinion; or ‘insect Armageddon’, which is based We derive a Bayesian hierarchical mixture (BHM) model to distin-
on data disputed by the original collectors6). However, even analyses of guish between the catastrophic and clustered declines hypotheses. The
the best available data reach conflicting conclusions. An analysis of a model statistically separates population trends into extreme declines,
global dataset of abundance time series of vertebrates estimated that, typical trends and extreme increases (Fig. 1), while accounting for
on average, vertebrate populations have declined by more than 50% time-series size, within-population fluctuations, number of popula-
since 1970 (Living Planet Index2 (LPI)); however, other global analyses tions and among-population variance. We test declines in abundance
found that the mean population size7,8 and species richness9,10 have for more than 14,000 vertebrate populations (from the LPI)15. We chose
remained stable over similar timeframes. Explanations for the discrep- LPI data because of its large scope, because the data and analytical
ancies have been proposed8,11–13, but not resolved. details were publicly available, and because previous analyses of these
One crucial consideration is that summary indices may be easily data suggested widespread, global declines2.
misinterpreted. Calculating the geometric mean across populations We first examined whether the previous estimate2 of a mean decline
is the most common and straightforward approach, but is strongly of more than 50% was sensitive to extreme populations: robust declines
influenced by extremes. To illustrate, imagine an ecosystem in which would support the catastrophic declines hypothesis, whereas high
1
Department of Biology, McGill University, Montreal, Quebec, Canada. 2Bieler School of Environment, McGill University, Montreal, Quebec, Canada. 3Department of Biological Sciences, Simon
Fraser University, Burnaby, British Columbia, Canada. 4School of Biology and Ecology, University of Maine, Orono, ME, USA. 5Mitchell Center for Sustainability Solutions, University of Maine,
Orono, ME, USA. 6Centre for Biological Diversity, University of St Andrews, St Andrews, UK. 7Indicators and Assessments Unit, Institute of Zoology, Zoological Society of London, London, UK.
✉e-mail: brian.leung2@mcgill.ca

Article
a All populations (n = 14,700) 238 populations removed
120 populations removed 356 populations removed
–0.1 0 0.1
b 1.0
Geometric growth index

–0.1 0 0.1 0.8
c
0.6
–0.1 0 0.1
d
1970 1980 1990 2000 2010

Year
–0.1 0 0.1 Fig. 2 | Effect of extreme populations on the global growth index. Removing
e
a small fraction of extreme populations strongly influences the geometric
growth index, using the LPI dataset. Each line represents a different number of
removed populations, ranging from no removals (red line; all 14,700
populations, which show a >50% mean decline) to removing 356 populations
–0.1 0 0.1 (yellow line, the removal of <2.4% of populations switches the global trend from
log(mean growth rate) negative to positive). A geometric growth index of 1 indicates no change
Fig. 1 | Stylized patterns of system-wide growth rates. a–e, Similar geometric (dashed horizontal black line).
mean population growth rates (log(N t + 1/N t)) can reflect contrasting systems.
c, As a null model, systems can be stable (log-transformed growth rates centred
around zero). Deviations can occur in multiple ways. a, b, Most populations in a
Evidence for clustered declines
system can be in substantial decline (catastrophic declines hypothesis) (a) or
the system can have multiple clusters, in which the majority of populations Among the 57 domain–realm–taxon systems of the LPI, 16 systems con-
show a distribution of growth rates centred around zero but with a small cluster tained clusters of extreme decline and 8 contained clusters of extreme
of populations experiencing extreme declines (clustered declines hypothesis) growth (of those, 3 systems are repeated, as they had both clusters of
(b). Each has the same metric of mean decline (vertical red line indicates a 1.5% extreme decline and growth) (Fig. 3 and Supplementary Table 2). Together,
annual decline, corresponding to a 50% loss over 50 years), even though most clusters of extreme decline accounted for only 1% of populations across
populations in b are stable. The converse can also happen; systems in which a systems (2% of populations in the 16 systems in which they occurred).
small cluster of populations shows an extreme increase, but that show an The mean population trend for extremely declining clusters across the 16
otherwise stable distribution (d) or systems in which most populations systems was θ2 = −3.94, or approximately 98% loss per year, and deviated
increase (e) can also occur (vertical blue line indicates a 1.5% annual increase, substantially from the mean trend of the primary cluster in those systems.
corresponding to a doubling over 50 years).
Clusters of extreme growth accounted for 0.4% of populations across
systems (2.4% in the 8 systems in which they occurred), with θ2 = 3.51, that
is, an explosive 33× growth per year (Fig. 3 and Supplementary Table 2).
sensitivity to a few populations would support the clustered declines Extreme clusters showed some taxonomic and geographic patterns.
hypothesis (Fig. 1). We then applied our BHM model to assess the evi- The largest cluster of extreme declines was in Arctic marine mammals,
dence for catastrophic or clustered declines globally and by region and accounting for 7.6% of populations in that system. However, mam-
taxonomy. Finally, we explore two additional conservation issues. First, malian systems generally had the fewest clusters of extreme decline
we test whether declines occur disproportionately in larger animals (19% of 16 systems), followed by reptile–amphibian systems (21% of 14
(large animals tend to have lower reproductive rates), which might systems), whereas bird and fish systems had more clusters of extreme
release small animals from predation16. Second, previous analyses declines (31% of 16 and 45% of 11 systems, respectively) (Fig. 3). Clus-
often excluded time series with few data points10,12,17, but small time ters of extreme decline occurred throughout the world, half of which
series make up most of the available data. We test the effects of their occurred in marine realms, whereas extreme increases occurred more
exclusion18. in temperate regions or terrestrial realms (Fig. 3).
Extreme population trends occurred predominantly in small time
series. Excluding time series with fewer than 10 points not only removed
Sensitivity of geometric mean to extreme populations all but two extreme clusters, but also removed 52% of the data (Sup-
The geometric mean index that underlies the LPI analysis was highly sen- plementary Table 3). The higher frequency of extreme trends among
sitive to extreme populations. Excluding only the 2.4% most-strongly small time series was also apparent in the raw data (Fig. 4). Thus the
declining populations (354 out of 14,700 populations) reversed the decision of whether to include small time series will have large effects
estimate of global vertebrate trends from a loss of more than 50% on the resulting estimates of global trends.
to a slightly positive growth (Fig. 2). Similarly, excluding 2.4% of the Body size was related to population trends. Larger species had three
most-strongly increasing populations strengthened the mean decline times more extreme declines than increases (15 compared with 5 clusters
to 71%. High sensitivity suggests that extreme populations are dispro- of extreme declines compared with extreme increases). Comparatively,
portionately affecting global trend estimates, such that clusters of smaller species had half as many (8) extremely declining and dispro-
extreme population decline should be considered explicitly. portionately more (7) extremely increasing clusters (Supplementary

a Palearctic
Nearctic
Neotropical
Indo-Malayan
Afrotropical
b Nearctic Palearctic
Neotropical Afrotropical Indo-Malayan
c Tropical and subtropical Atlantic north Arctic

temperate Pacific north
Indo-Pacific
temperate
*
Atlantic tropical
and subtropical *
South temperate
and Antarctic
Fig. 3 | Population trends by taxonomic groups and realms. a, The terrestrial orange, strong non-significant declines; green, strong non-significant
realm. b,The freshwater realm. c, The marine realm. Red and blue asterisks increases; yellow, weak changes). Maps were created using ArcGIS software by
indicate the occurrence of extremely declining clusters (16 systems) and Esri (ArcGIS and ArcMap are the intellectual property of Esri and are used
increasing clusters (8 systems), respectively. Distributions show the primary herein under licence. Copyright © Esri. All rights reserved. For more
cluster in each system. Red, significant declines; blue, significant increases; information about Esri software, please visit https://www.esri.com).
Table 4). Although size-specific models included fewer populations, overall growth rate of primary clusters was close to zero: θ1 = −0.00035,
especially for smaller species, the number of clusters was not uniformly corresponding to around 1.7% loss over 50 years, given a constant
lower (as might be expected given a reduction in power); therefore, the rate across populations and time (Fig. 5). In addition, in contrast to
differential occurrence of extremely declining versus increasing clusters extreme clusters, primary cluster trends were robust to time-series
suggests that large animals are more vulnerable to extreme declines. size, as excluding series with fewer than 10 data points yielded a similar
overall global trend (θ1 = 0.0043) (Extended Data Fig. 3).
Although the global BHM model reveals considerably more nuance
Evidence for catastrophic declines than a geometric mean index, analysing across systems still masked
In contrast to the extreme clusters, the primary clusters accounted for important patterns. When systems were analysed separately (Supple-
the vast majority (98.6%) of populations across the 57 LPI systems. The mentary Table 2), primary population clusters were strongly declining

Article
Significant decline
Strong non-significant decline
Weak change or increase
Frequency
log(mean growth rate)
−3
–0.1 0 0.1
log(annual growth rate)
Fig. 5 | Populations in the primary clusters across all systems, after removal
of extreme clusters. The primary cluster of each system is unimodal, but
because systems are experiencing decline (or growth) heterogeneously,
−6
plotting distributions across systems shows multimodality. Histograms show
significantly declining systems (red), strongly but not significantly declining
0 10 20 30 40
systems (orange) and weak changes or increases (yellow). Vertical lines show
Number of data points in time series thresholds for strongly declining (−0.015) and strongly increasing (+0.015)
growth rates, corresponding to an approximate 50% loss or a doubling (over 50
Fig. 4 | Effect of the size of the time series. The number of data points in the
years), respectively. Distributions of primary clusters were calculated based on
time series versus the mean log-transformed value of the geometric mean
the mean and s.d. from the hierarchical model, and using the system-specific
growth rate.
weights to adjust for species richness.
(θ1 < −0.015) with high certainty (95% credible intervals not overlapping

zero) in three systems, all of which occurred in the Indo-Pacific realm
(freshwater mammals, freshwater birds and terrestrial birds) (Fig. 3). Discussion
This suggests that this region has the highest risk of system-wide By re-analysing a comprehensive dataset of global wildlife population
declines and should be a conservation priority. By contrast, the pri- trends, we show that previously estimated global declines are driven by
mary cluster was increasing with high certainty in seven systems, six a few extremely declining populations. Removing only 2.4% of declining
of which were in temperate regions. In addition, seven additional sys- populations reversed the estimated global trends from more than 50%
tems had strongly declining primary population clusters but with less mean decline since 1970 to a slightly positive growth. Our BHM model
certainty (95% credible intervals overlapped zero), four of which were revealed that clusters of extreme decline are widespread and occur
amphibian or reptile groups. Finally, 14 systems showed strong but disproportionately in larger species, and that a few clusters of extreme
low-certainty increases, with no obvious taxonomic nor geographic increase also exist and occur disproportionately in smaller species.
patterns (Fig. 3). This is consistent with previous arguments of ‘trophic downgrading’16.
Each primary cluster also contained variation among populations. Clusters of extreme declines were largely due to small time-series
In the 10 systems with significant or non-significant mean declines datasets. However, neither random sampling error nor ‘saw tooth’
where θ1 < −0.015, 87% of the individual populations showed strong population dynamics (in which ultimately stable populations experi-
declines (Fig. 5). These 10 systems accounted for around 20% of the total ence sudden declines followed by gradual increases) can fully explain
global vertebrate populations, but for around 61% of strong declines. this association (see Supplementary Information for a full discussion).
The multimodality observed in Fig. 5 was an outcome of aggregating Additional explanations are needed. Extreme trends could reflect tran-
unimodal primary clusters across systems, and suggests that there are sient populations that naturally leave or enter a survey area19, which
heterogeneous stressor levels among systems (that is, similar principles could represent natural dynamics. Alternatively, researchers may stop
to those that cause extreme clusters within systems). The remaining sampling after populations become (close to) extirpated, although the
approximately 11% of strongly declining populations were distributed converse has also been suggested20. A third possibility is that some
across 47 out of 57 systems; it is unclear whether they represent a devia- regions experience both lower sampling effort and greater declines,
tion from the natural dynamics that are expected to occur in any natu- such that poorly sampled datasets correlate with factors linked to
rally variable system. vulnerability, such as lower national wealth or conservation invest-
Primary cluster trends were related to body size, but not as predicted. ment. Understanding why small time series contain so many extreme
In comparison to the overall patterns for larger animals, the same sys- declines is particularly important given that studies that did not find
tems showed significant declines and increases, but two additional widespread declines often excluded short time series7,10,12, potentially
temperate systems showed significant increases (Extended Data Fig. 4 reconciling divergent findings among studies.
and Supplementary Table 4). Smaller species also appeared to decline Once extreme clusters were statistically separated, no global trend
more than larger species; there were 27 systems in which smaller spe- remained across typical populations (that is, primary clusters; 98.6%
cies had more-negative growth rates than larger species, compared of populations). However, aggregating systems into one global trend
with 18 systems in which the reverse was true. However, analyses of hid important variation. Three systems, all of which occurred in the
the smaller species were based on substantially fewer populations, Indo-Pacific realm, showed widespread vertebrate declines across
and trends were generally not significant (Supplementary Table 4), typical populations. Moreover, among typical populations smaller
so patterns remain tentative. species may be faring worse than larger ones. Although these results

were tentative given lower sample sizes and high uncertainty, this trend 5. Ceballos, G., Ehrlich, P. R. & Dirzo, R. Biological annihilation via the ongoing sixth mass
extinction signaled by vertebrate population losses and declines. Proc. Natl Acad. Sci.
is contrary to common conservation assumptions and so merits addi- USA 114, E6089–E6096 (2017).
tional research. 6. Willig, M. R. et al. Populations are not declining and food webs are not collapsing at the
Our results emphasize an important point: biodiversity trends within Luquillo Experimental Forest. Proc. Natl Acad. Sci. USA 116, 12143–12144 (2019).
7. Daskalova, G. N., Myers-Smith, I. H. & Godlee, J. L. All is not decline across global
and across regions and taxa are highly disparate. This probably reflects vertebrate populations. Preprint at https://doi.org/10.1101/272898 (2018).
differences in both susceptibility and exposure to anthropogenic 8. Dornelas, M. et al. A balance of winners and losers in the Anthropocene. Ecol. Lett. 22,
environmental change21–23. Unravelling this variation is imperative to 847–854 (2019).
9. Vellend, M. et al. Global meta-analysis reveals no net change in local-scale plant
understand in which regions biodiversity is threatened the most24 and biodiversity over time. Proc. Natl Acad. Sci. USA 110, 19456–19459 (2013).
which conservation actions promote stability or recovery. A productive 10. Dornelas, M. et al. Assemblage time series reveal biodiversity change but not systematic
global conversation about conservation requires that both scientists loss. Science 344, 296–299 (2014).
11. Gonzalez, A. et al. Estimating local biodiversity change: a critique of papers claiming no
and media pay more attention to variation and resist the temptation net loss of local diversity. Ecology 97, 1949–1960 (2016).
of simple summary indices. 12. Leung, B., Greenberg, D. A. & Green, D. M. Trends in mean growth and stability in
Shifting the message from ubiquitous catastrophe to foci of concern, temperate vertebrate populations. Divers. Distrib. 23, 1372–1380 (2017).
13. McGill, B. J., Dornelas, M., Gotelli, N. J. & Magurran, A. E. Fifteen forms of biodiversity
also touches on human psychology. Continual negative and guilt-ridden trend in the Anthropocene. Trends Ecol. Evol. 30, 104–113 (2015).
messaging can cause despair, denial and inaction25,26. If everything is 14. Anderson, S. C., Branch, T. A., Cooper, A. B. & Dulvy, N. K. Black-swan events in animal
declining everywhere, despite the expansion of conservation measures populations. Proc. Natl Acad. Sci. USA 114, 3252–3257 (2017).
15. LPI. Living Planet Index. www.livingplanetindex.org/ (2016).
in recent decades, it would be easy to lose hope. Our results identify 16. Estes, J. A. et al. Trophic downgrading of planet Earth. Science 333, 301–306 (2011).
not only regions that need urgent action to ameliorate widespread 17. Connors, B. M., Cooper, A. B., Peterman, R. M. & Dulvy, N. K. The false classification of
biodiversity declines, but also many systems that appear to be gener- extinction risk in noisy environments. Proc. R. Soc. Lond. B 281, 20132935 (2014).
18. Hanks, E. M., Hooten, M. B. & Baker, F. A. Reconciling multiple data sources to improve
ally stable or improving, and thus provide a reason to hope that our accuracy of large-scale prediction of forest disease incidence. Ecol. Appl. 21, 1173–1188
actions can make a difference. (2011).
19. Youngflesh, C. & Lynch, H. J. Black-swan events: population crashes or temporary
emigration? Proc. Natl Acad. Sci. USA 114, E8953–E8954 (2017).
20. Fournier, A. M. V., White, E. R. & Heard, S. B. Site-selection bias and apparent population
Online content declines in long-term studies. Conserv. Biol. 33, 1370–1379 (2019).
21. Newbold, T. et al. Ecological traits affect the response of tropical forest bird species to
land-use intensity. Proc. R. Soc. Lond. B 280, 20122131 (2013).
maries, source data, extended data, supplementary information, 22. Venter, O. et al. Sixteen years of change in the global terrestrial human footprint and
acknowledgements, peer review information; details of author con- implications for biodiversity conservation. Nat. Commun. 7, 12558 (2016).
23. Allan, J. R. et al. Hotspots of human impact on threatened terrestrial vertebrates. PLoS
tributions and competing interests; and statements of data and code
Biol. 17, e3000158 (2019).
availability are available at https://doi.org/10.1038/s41586-020-2920-6. 24. Blowes, S. A. et al. The geography of biodiversity change in marine and terrestrial
assemblages. Science 366, 339–345 (2019).
25. O’Neill, S. & Nicholson-Cole, S. “Fear won’t do it”: promoting positive engagement with
1. IUCN. The IUCN Red List of Threatened Species. version 2019-3 http://www.iucnredlist. climate change through visual and iconic representations Sci. Commun. 30, 355–379 (2009).
org (2019). 26. Brennan, L. & Binney, W. Fear, guilt, and shame appeals in social marketing. J. Bus. Res.
2. WWF. Living Planet Report 2018: Aiming Higher (eds. Grooten, N. & Almond, R. E. A.) 63, 140–146 (2010).
(WWF, 2018).
3. Rosenberg, K. V. et al. Decline of the North American avifauna. Science 366, 120–124 Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
(2019). published maps and institutional affiliations.
4. Sánchez-Bayo, F. & Wyckhuys, K. A. G. Worldwide decline of the entomofauna: a review of
its drivers. Biol. Conserv. 232, 8–27 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods architecture, as it has several desirable properties: (1) it can represent
the null model and assess deviations from it; (2) it enables testing for
Dataset both negative and positive extremes (sometimes both existed in the
The publically available LPI dataset includes 15,241 vertebrate same system); (3) it quantifies the magnitude and proportion of those
populations from 3,510 species15. When a species contained both extremes; (4) it provides a coherent way to separate extreme popula-
finer-resolution estimates within a country (2,593 entries) and a tions from the majority of populations (the primary cluster), which
country-wide aggregate, we excluded the country-wide aggregate enables tests of the clustered and catastrophic declines hypotheses;
(537 entries), yielding 14,700 populations. LPI groups species into 57 (5) it provides a measure of uncertainty as a direct outcome of analysis
systems defined by a combination of habitat domain (terrestrial, fresh- (through the posterior distribution); and (6) it accounts for population
water or marine), biogeographical realm (terrestrial/freshwater realms, fluctuations and adjusts for the number of data points in the time series.
Afrotropical, Nearctic, Neotropical, Palearctic, Indo-Pacific; marine, First, we specify the null model. Even in a system with no overall
Arctic, Atlantic north temperate, Atlantic tropical/sub-tropical, Pacific trend, we expect stochastic fluctuations in population size. We also
north temperate, Indo-Pacific tropical/sub-tropical, South-temperate/ expect some populations to be increasing or decreasing during any
Antarctic) and taxonomic grouping (fish, Actinopterygii, Elasmo- time interval, given complex, real-world ecological dynamics. Thus,
branchii, Holocephali, Myxini, Chondrichthyes, Sarcopterygii, Ceph- the null model should include among-population heterogeneity, and
alaspidomorphi; birds, Aves; mammals, Mammalia; herps, Amphibia, therefore consists of a distribution of growth rates (Fig. 1c). Statistical
Reptilia) (Extended Data Figs. 5–8). deviations from this null model could be caused by a shift in the overall
To analyse the effect of body size, we obtained information on each distribution, in which a system-wide mean growth <0 (that is, decline)
taxonomic group. Given the diversity of vertebrate groups in this could indicate a risk to the entire system, which would support the
dataset and the different conventions across groups, we used differ- catastrophic declines hypothesis (Fig. 1a). Alternatively, statistical
ent measures of body size for each taxonomic class on the basis of deviation from the null model could be caused by a few populations
data availability. For birds (n = 1,397), mammals (n = 534) and reptiles that experience extreme declines, which is consistent with the clustered
(Squamata, n = 132; Testudines, n = 44; and Crocodylia, n = 16) we used declines hypothesis (Fig. 1b).
estimates of the mass of the species (in grams) collated in an extensive To specify our model, we begin with a standard Bayesian hierarchical
comparative dataset27. When mass data were missing for a species (n = 14 formulation (that is, it does not yet contain mixtures of distributions).
birds; n = 1 mammal; n = 25 reptiles), we estimated body mass as the geo- We define θ and τ as the system-wide mean and variance, respectively,
metric mean of available mass estimates for species in that genus. For of log-transformed growth rates across all populations in the system
fishes (Chondrichthyes, Osteichthyes and Agnatha; n = 1,211), estimates (that is, hyperparameters in Bayesian terminology). θ and τ determine
of mass were scarce for most species, so we instead used estimates of the distribution of the log-transformed population trends (μi) and
total length or standard length (in centimetres), both of which were define the properties of the overall system. However, within-population
extracted from FishBase28 using the rfishbase R package29. These length dynamics are also occurring, and the log-transformed growth rates
estimates are an imperfect proxy for size (in terms of mass) given the for population i at time t are modelled as a population trend (μi) and
variability in body plans across groups, but given the large amount of within-population fluctuations (σ) (see Supplementary Information
variation across these groups it suffices as a way to broadly categorize 1b for full details and model formulation).
species into distinct size classes. For amphibians, we used estimates Using a standard Bayesian hierarchical model, we can test the cata-
of snout–vent length (in millimetres) as our proxy for body size, as strophic declines hypothesis by determining the probability that a
this is the most widely available metric of size across species. Data on system-wide mean value of θ < 0. Testing the clustered declineshypoth-
snout–vent length for amphibian species (n = 175) were extracted from esis, however, requires a mixture model to assess the evidence for the
a comprehensive ecological trait dataset: AmphiBio30. occurrence of clusters. Thus, we define K as the number of clusters in
the mixture, fk is the fraction of populations in the kth cluster, and θ, τ
Sensitivity of the geometric indices to extreme population and f denote the vectors of the parameters for the K clusters.
trends To test the clustered declines hypothesis, we modelled three clus-
The LPI analysis was based on a geometric mean approach, calculated ters: a primary cluster, corresponding to the typical trend; a negative
by summing across log-transformed growth rates31. We recreated the extreme cluster; and a positive extreme cluster (Fig. 1). Although our
geometric-mean-based analyses (see Supplementary Information 1a main interest was in the mechanisms behind apparent global popula-
for full details and model formulation) and examined the sensitivity of tion declines (that is, catastrophic versus clustered declines hypoth-
the global estimate to extreme populations. We ordered populations eses), we also assayed positive extreme clusters so that analyses were
and sequentially removed the largest observed decline, determin- not biased to find only negative population trends. We considered four
ing the effect of each removal on the global estimate of biodiversity cluster combinations: (1) a single distribution; (2) a primary distribution
loss. Low sensitivity would indicate that many or most populations and a negative extreme distribution; (3) a primary distribution and a
are declining, supporting the catastrophic declines hypothesis. High positive extreme distribution; or (4) a primary distribution and both
sensitivity—that is, if removal of relatively few populations switched positive and negative extreme distributions (Fig. 1). For referencing
the strongly negative global trend to neutral or positive—would sup- purposes, we denote k = 1 as the primary cluster, k = 2 as the negative
port the clustered declines hypothesis. For balance, we also examined extreme cluster, and k = 3 as the positive extreme cluster. Reality need
sensitivity to sequential removal of the greatest increasing populations. not be bi-modal (or tri-modal), but exploring generalities in trends
necessitates some aggregation. Nonetheless, the extreme clusters
Catastrophic versus clustered declines approach identified by the mixture model could contain multiple extreme modes
We developed an approach to separate extreme population clusters the in the data (or even result from a skewed distribution). With any of these
growth or decline of which statistically deviated from typical popula- deviations, model selection would still choose the mixture model as
tion trends, such that a small number of extreme populations would explaining the data better than a single normal distribution (see Sup-
no longer mask trends of the majority of populations (Fig. 1). Although plementary Information 1c for full details and model formulation).
some summarization is needed to understand global trends, hetero- We used the (lowest) deviance information criterion value to select
geneous growth rates and potentially multimodal distributions could the mixture model with the strongest statistical evidence32. The cata-
be expected, given multiple stressors with diverse effects, and differ- strophic declines hypothesis would be supported by a mean decline
ences in species vulnerabilities. We used a BHM model as our statistical of the primary population cluster (θ1 < 0 and credible intervals did not
overlap zero), and would be particularly severe if the mean θ1 was also the amphibians, we separated out the orders Caudata and Anura and
strongly negative (for example, θ1 = −0.015 would correspond to >50% scaled size within each of these groups. For each taxonomic group,
loss over 50 years). The clustered declines hypothesis would be sup- we scaled body size and separated species into larger-than-average
ported if the deviance information criterion selected a mixture with a (hereafter ‘larger’) versus smaller-than-average (hereafter ‘smaller’)
negative extreme cluster (combinations 2 or 4 above). The catastrophic species. This yielded 9,596 populations from 1,765 larger species, and
and clustered declines hypotheses are not mutually exclusive, as a sys- 5,103 populations from 1,745 smaller species. We then reran the BHM
tem could have both a negative extreme cluster and declining primary model for larger animals and again for smaller animals. Body sizes
cluster. A large fraction of populations in the negative extreme cluster were divided unevenly among habitat domains and realms; 12 domain–
(f2) could also be interpreted as widespread catastrophic declines, but realm–taxon systems contained ≤1 smaller species so were excluded
this did not occur in our results. Although our hypotheses focus on from the small-animal model.
understanding declining trends, our model will also detect increases
in abundances. Reporting summary
To estimate the model parameters, we used Bayesian analyses and Further information on research design is available in the Nature
the Markov chain Monte Carlo algorithm, which simultaneously esti- Research Reporting Summary linked to this paper.
mated uncertainty. For each Bayesian analysis, we ran 3 chains, each
with 10,000 iterations (3,000 used for burn-in). Convergence was
determined using R̂ ≈ 1. Values for all parameters across all systems Data availability
ranged from (0.999 < Rˆ < 1.005). Bayesian analyses were conducted Data can be obtained from the LPI database (www.livingplanetin-
using the STAN language33, and processed and analysed in R34. dex.org), AmphiBio30 (https://figshare.com/articles/Oliveira_et_al_
Additionally, we explored the theoretical behaviour of each model, AmphiBIO_v1/4644424), FishBase (www.fishbase.org)28 and life-history
including the geometric mean model, in the presence of clustered traits can be obtained from the amniote life-history database27 (https://
declines (Supplementary Information 1d, 2a), and our catastrophic and doi.org/10.6084/m9.figshare.c.3308127.v1).
clustered declines approach given our selection of priors, application
of constraints and other modelling choices; these simulation analyses
showed that our approach yielded appropriate theoretical behaviour Code availability
(Extended Data Fig. 1 and Supplementary Information 1e, 2b). Finally, Code for the BHM model is available at: https://doi.org/10.5281/
we conducted sensitivity analyses and showed that results were robust zenodo.3901586.
to modelling choices (Extended Data Fig. 2, Supplementary Informa-
tion 2c and Supplementary Table 1). 27. Myhrvold, N. P. et al. An amniote life-history database to perform comparative analyses
with birds, mammals, and reptiles. Ecology 96, 3109 (2015).
28. Froese, R. & Pauly, D. FishBase version 12/2019 www.fishbase.org (2019).
Application of the catastrophic and clustered approach to LPI
29. Boettiger, C., Lang, D. T. & Wainwright, P. C. rfishbase: exploring, manipulating and
data visualizing FishBase data from R. J. Fish Biol. 81, 2030–2039 (2012).
We tested for extreme clusters in each of the 57 domain–realm–taxon 30. Oliveira, B. F., São-Pedro, V. A., Santos-Barrera, G., Penone, C. & Costa, G. C. AmphiBIO, a
global database for amphibian ecological traits. Sci. Data 4, 170123 (2017).
systems of the LPI, by choosing the mixture model with the lowest 31. Collen, B. et al. Monitoring change in vertebrate abundance: the living planet index.
deviance information criterion value. We also examined the number of Conserv. Biol. 23, 317–327 (2009).
populations in each cluster, as a fraction of the total number of popula- 32. Gelman, A., Hwang, J. & Vehtari, A. Understanding predictive information criteria for
Bayesian models. Stat. Comput. 24, 997–1016 (2014).
tions, scaled using LPI system-specific weightings35 (see Supplementary 33. Carpenter, B. et al. Stan: a probabilistic programming language. J. Stat. Softw. 76, 1–32
Information 1f for more details). (2017).
Next, we examined evidence for the catastrophic declines hypothesis 34. R Core Team. R: A Language and Environment for Statistical Computing http://
www.R-project.org/ (R Foundation for Statistical Computing, 2016).
in each system by searching for negative mean growth rates in the pri- 35. McRae, L., Deinet, S. & Freeman, R. The diversity-weighted living planet index: controlling
mary cluster (θ1). We defined ‘high certainty’ of decline (or increase) as for taxonomic bias in a global biodiversity indicator. PLoS ONE 12, e0169156 (2017).
95% credible intervals that did not overlap zero, and ‘strong’ decline as
θ1 < −0.015, corresponding to a ~50% decline if it persisted for 50 years Acknowledgements We thank E. Hudgins, D. Nguyen, S. Varadarajan and A. Jones for
(θ1 > 0.015 was used for a strong positive relations, corresponding to a discussions, T. Coulson for comments and S. Varadarajan and F. Moyes for help creating the
figures. This work was supported by a Natural Sciences and Engineering Research Council
doubling over 50 years). (NSERC) Discovery grant to B.L.
We assessed the effect of small time series on both extreme clusters
and trends in primary clusters, by omitting all data with fewer than Author contributions Authors are listed in order of their contributions. B.L. formulated the
BHM model, conducted analyses and wrote the majority of the paper. A.L.H. discussed and
10 points, as has often been done in other studies12. These small time clarified the ideas and had a central role in the writing of the paper. D.A.G. discussed and
series accounted for 52% of the population estimates (7,110 populations clarified the ideas, synthesized the data and contributed to the writing of the paper. B.M.
remained in the analysis). discussed and clarified the ideas, and commented on the manuscript. M.D. discussed and
clarified the ideas, commented on and improved the presentation of the manuscript. R.F.
Finally, we examined whether trends differed between large- versus discussed and clarified the ideas, and provided insight into the LPI data and analyses.
small-bodied animals. Within each class (but with Agnatha lumped
with Osteichthyes), we scaled body size as standard deviations on the Competing interests The authors declare no competing interests.
natural log scale—thus creating an index of relative species size within a
taxonomic group. In two cases, we separated out different groups within Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
a class that had relatively distinct body plans that would influence this 2920-6.
Correspondence and requests for materials should be addressed to B.L.
size scaling. We scaled size within the superorder Batoidea (Rajiformes,
Peer review information Nature thanks Tim Coulson and the other, anonymous, reviewer(s) for
Myliobatiformes and Torpediniformes) and separately scaled size for their contribution to the peer review of this work.
the rest of the chondrichthyans (Selachimorpha and Holocephali). For Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Theoretical analyses of BHM model. The p–p plots the fraction in each cluster ( f 1, f 2 = 1 − f 1). The 1:1 line is the theoretic
show that the posterior distributions for each estimated parameter are expectation, indicating that the true parameter value falls below the 0.01
unbiased and largely follow a 1:1 line for each hyper parameter (σ, τ) as well as quantile 1% of the time, the 0.02 quantile 2% of the time, and so on.
Extended Data Fig. 2 | Sensitivity analyses of primary cluster trends. The
trends of the primary clusters (θ1), for the main analysis (x axis) versus the
sensitivity analysis (y axis) for the threshold for extreme clusters (top) and the
offset when n = 0 was observed (bottom).
Article
Extended Data Fig. 3 | Effect of small time series on primary cluster trends.
Each point represents a trend estimate for the primary cluster of a system, with
the full dataset (x axis) versus data excluding time series with less than 10 data
points (y axis). The red dot indicates the freshwater Indo-Pacific mammals,
which was reduced from 22 populations (full) to 2 populations (only data with
at least 10 data points).
Extended Data Fig. 4 | Mean trends of primary clusters across systems
calculated using the BHM model. Top, all species (14,700 populations).
Middle, only large species (9,596 populations). Bottom, only small species
(5,103 populations). The small species appear to be declining more than large
species, although this finding needs to be interpreted with caution, as most
primary distributions did not significantly deviate from zero for small species.
Article
Extended Data Fig. 5 | Histograms of observed growth rates and output of Indo-Pacific birds) or only one direction (for example, terrestrial Neotropical
the BHM model for systems 1–16. Blue line, primary cluster; red line, extreme mammals), but not for other apparent clusters (for example, terrestrial
cluster(s) from the model. Grey vertical lines show the range of observed Indo-Pacific herps). The BHM integrates the magnitude of within-population
values. In comparing the model output to the data we show the following. (1) fluctuations, time-series sizes, number of populations, among-population
The variation of the BHM primary cluster (blue line) is much lower than the raw variance, and the magnitude and frequency of the extreme populations in
data, because the BHM separates variation in among-population trends from determining whether additional (extreme) clusters are needed to account for
variation due to within-population fluctuations. (2) The BHM model identifies the observations.
evidence for extreme clusters in both directions (for example, terrestrial
Extended Data Fig. 6 | Histograms of observed growth rates and output of the BHM model for systems 17–32. Blue line, primary cluster; red line, extreme
cluster(s) from the model. Grey vertical lines show the range of observed values. For further information, see Extended Data Fig. 5.
Article
nature research | reporting summary
Corresponding author(s): Brian Leung
Last updated by author(s): Aug 27, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested

A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons
A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient)
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals)
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Software and code

Policy information about availability of computer code
Data collection Data from the Living Planet Index database. <www.livingplanetindex.org/>. (2016) was scraped in R 3.6.3. Data was extracted from
Fishbase using rfishbase 3.0.4.
Data analysis Bayesian analyses were conducted using the STAN 2.14 language, and processed and analyzed in R 3.6.3. The lme4 1.1-23 package was
referenced in the text. Custom code from this article can be obtained at: https://doi.org/10.5281/zenodo.3901586
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
October 2018
Data can be obtained from the Living Planet Index database. <www.livingplanetindex.org/>. (2016), AmphiBIO database from <https://figshare.com/articles/
Oliveira_et_al_AmphiBIO_v1/4644424>, Fishbase database <www.fishbase.org>, and mammal, bird and reptile life history traits from <https://doi.org/10.6084/
m9.figshare.c.3308127.v1>
1
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Ecological, evolutionary & environmental sciences study design

All studies must disclose on these points even when the disclosure is negative.
Study description The study re-examined previous estimates of global vertebrate declines from the Living Planet Index (LPI), and demonstrates that it is
driven by a small fraction of extreme population trends. The study then goes on to use a Bayesian Hierarchical Mixture Model to
separate the extreme clusters and primary cluster. The LPI dataset that was publicly available consists of 15241 vertebrate
populations from 3510 species, grouped into 57 domain-realm-taxon systems (hereafter just ‘systems’) (LPI 2016). Each system was
analyzed separately, and also was combined into a global metric using weighting factors (adjusted for species richness in each
system) from LPI, for comparability.
Research sample The data was obtained from the Living Planet Index database. <www.livingplanetindex.org/>. (2016), and consisted of 15241
vertebrate populations. To avoid double counting, when a species contained both finer resolution estimates within a country (2593
entries) as well as a country-wide aggregate, we excluded the country-wide aggregate (537 entries). This resulted in 14700
populations remaining in our analysis. Each system was defined by a combination of habitat domain (terrestrial, freshwater and
marine), biogeographic realm, and taxonomic grouping (Fish=Actinopterygii, Elasmobranchii, Holocephali, Myxini, Chondrichthyes,
Sarcopterygii, Cephalaspidomorphi; Birds=Aves, Mammals=Mammalia, Herps = Amphibia, Reptilia). Terrestrial and freshwater habitat
domains were separated into five realms (Afrotropical, Nearctic, Neotropical, Palearctic, and Indo-Pacific), whereas the marine
domain was separated into six realms (Arctic, Atlantic north temperate, Atlantic tropical/sub-tropical, Pacific north temperate, Indo-
Pacific tropical/sub-tropical, and South-temperate/Antarctic).
Sampling strategy All population time-series data in the LPI dataset were used. To avoid double counting, when a species contained both finer
resolution estimates within a country (2593 entries) as well as a country-wide aggregate, we excluded the country-wide aggregate
(537 entries). This resulted in 14700 populations remaining in our analysis.
Data collection The data was obtained by Dan Greenberg, and downloaded from publicly available databases identified in the data availability
statement
Timing and spatial scale Data were analyzed from 1970-2014, as these coincided with the analyses from the Living Planet Index. The spatial scale for the
analysis was global. The data was comprised of 14700 populations across many studies, and thus was measured at many scales. Thus,
relative changes per population was used.
Data exclusions To avoid double counting, when a species contained both finer resolution estimates within a country (2593 entries) as well as a
country-wide aggregate, we excluded the country-wide aggregate (537 entries). This resulted in 14700 populations remaining in our
analysis.
Reproducibility This is not relevant, as the existing LPI database was used. The purpose of the study was not an experiment, but instead to re-analyze
the available information on vertebrate trends, to evaluate whether previous estimates of decline (>50%) were due to clusters of
extremely declining populations, and to separate and analyze extreme clusters and primary clusters separately.
Randomization This is not relevant, as the existing LPI database was used, chosen for its impressive size and geographic coverage, and because
previous analyses of these data suggested broad-scale average vertebrate.
Blinding This is not relevant. Blinding as done in clinical trials, where group assignments of individuals is hidden from some researchers.
Primary data collection and experiments were not conducted in this study.
Did the study involve field work? Yes No
Reporting for specific materials, systems and methods

We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material,
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response.
October 2018
2
Materials & experimental systems Methods

n/a Involved in the study n/a Involved in the study
Antibodies ChIP-seq
Eukaryotic cell lines Flow cytometry
Palaeontology MRI-based neuroimaging
Animals and other organisms
Human research participants
Clinical data
October 2018
3
Article
Late Cretaceous bird from Madagascar

reveals unique development of beaks
https://doi.org/10.1038/s41586-020-2945-x Patrick M. O’Connor1,2,3 ✉, Alan H. Turner4, Joseph R. Groenke1, Ryan N. Felice5,

Raymond R. Rogers3,6, David W. Krause3,4 & Lydia J. Rahantarisoa7

Mesozoic birds display considerable diversity in size, flight adaptations and feather
organization1–4, but exhibit relatively conserved patterns of beak shape and
Check for updates
development5–7. Although Neornithine (that is, crown group) birds also exhibit
constraint on facial development8,9, they have comparatively diverse beak
morphologies associated with a range of feeding and behavioural ecologies, in
contrast to Mesozoic birds. Here we describe a crow-sized stem bird, Falcatakely
forsterae gen. et sp. nov., from the Late Cretaceous epoch of Madagascar that
possesses a long and deep rostrum, an expression of beak morphology that was
previously unknown among Mesozoic birds and is superficially similar to that of a
variety of crown-group birds (for example, toucans). The rostrum of Falcatakely is
composed of an expansive edentulous maxilla and a small tooth-bearing premaxilla.
Morphometric analyses of individual bony elements and three-dimensional rostrum
shape reveal the development of a neornithine-like facial anatomy despite the
retention of a maxilla–premaxilla organization that is similar to that of nonavialan
theropods. The patterning and increased height of the rostrum in Falcatakely reveals
a degree of developmental lability and increased morphological disparity that was
previously unknown in early branching avialans. Expression of this phenotype
(and presumed ecology) in a stem bird underscores that consolidation to the
neornithine-like, premaxilla-dominated rostrum was not an evolutionary prerequisite
for beak enlargement.
Our understanding of the evolution of Mesozoic birds continues to falls within a critical spatiotemporal gap. Very few avialans are known
improve, driven predominantly by discoveries from the Early Creta- from the entire Cretaceous period of Afro-Madagascar. The specimen
ceous epoch of China1–3,6. Although these specimens show considerable expands our knowledge of realized cranial shape disparity, in terms of
variation in body size, soft-tissue anatomy and inferred ecologies2–4,10,11, both morphological details and the proportions of elements, within
the disparity in Mesozoic avialan cranial shape remains restricted to the enantiornithine radiation and Mesozoic birds as a whole.
a relatively limited number of forms that are considered to be either
generalists or substrate-probing specialists5,6,12–15 and represent groups
that are only distantly related to crown birds. The Late Cretaceous Systematic palaeontology
(about 100–66 million years ago) chapter of avialan evolution remains Theropoda Marsh, 1881
relatively incomplete owing to a paucity of new fossil discoveries Paraves Sereno, 1997
(although see recent studies on birds such as Ichthyornis16 and Aste- Avialae Gauthier, 1986
riornis17). Thus, new fossils of Late Cretaceous birds are essential for Ornithothoraces Chiappe, 1995
refining hypotheses that relate to the morphological evolution and Enantiornithes Walker, 1981
diversification of avialans. Falcatakely forsterae gen. et sp. nov.
The phylogenetic diversity of early branching (non-neornithine)
Mesozoic birds is dominated by enantiornithines, which have been Etymology. ‘Falcata’ (from Latin falcatus), meaning armed with a
heralded as the first diversification of avialans and are characterized scythe, in reference to the shape of the rostrum; ‘kely’ (Malagasy),
by a range of body sizes and inferred habits2,14,18–21. This radiation is meaning small; ‘forsterae’, in recognition of Catherine A. Forster’s
notable for its apparent near-global distribution throughout most of contributions to work on Madagascan paravians.
the Cretaceous period. An exceptionally well-preserved partial cranium Holotype. Partial cranium (University of Antananarivo, UA 10015),
of a previously unknown enantiornithine (University of Antananarivo which consists of the rostrum, palate and periorbital regions (Fig. 1,
[UA] 10015) from the latest Cretaceous (Maastrichtian) of Madagascar Extended Data Figs. 1, 2 and Supplementary Videos 1–8).
1
Department of Biomedical Sciences, Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, USA. 2Ohio Center for Ecological and Evolutionary Studies, Ohio University,
Athens, OH, USA. 3Department of Earth Sciences, Denver Museum of Nature & Science, Denver, CO, USA. 4Department of Anatomical Sciences, Stony Brook University, Stony Brook, NY, USA.
5
Centre for Integrative Anatomy, Department of Cell and Developmental Biology, University College London, London, UK. 6Geology Department, Macalester College, St Paul, MN, USA.
7
Département de Sciences de la Terre et de l’Environnement, Université d’Antananarivo, Antananarivo, Madagascar. ✉e-mail: oconnorp@ohio.edu

Locality and horizon. Locality MAD05-42, Berivotra Study Area, Upper a
Cretaceous (Maastrichtian; 72.1–66 million years ago) Anembalemba
Member, Maevarano Formation, Mahajanga Basin, northwestern
Madagascar22.
Diagnosis. Differs from other paravians on the basis of the following
combination of features (*indicates autapomorphies): extended, high
maxilla that forms the dorsal contour of the rostrum*; dimpled texture
on the nasal and lacrimal, particularly on the triangular caudodorsal
process of the latter*; lacrimal with caudally expanded ventral pro- b
cess*; large, flat jugal process of the postorbital*. Further differs from jpmx
po (r) mpmx
most avialans by: a long, straight quadratojugal process of the jugal;
antorbital fenestra nearly as long as tall. Further differs from other
pmx (l)
enantiornithines by: premaxilla slots into an extended V-shaped sulcus ect (r)
pter (r)
of the maxilla*; narrow rostrum (width at premaxilla–maxilla junction
estimated at around 15% maximum width at rostral margin of orbit); a pal (l) pal (r) mx (r) to
nasal with distinct fossa near the rostral end*. mx (l)
pmx (r)
Remarks. Avialae and Neornithes are used herein to delimit increas- sr na (r)
ingly less inclusive monophyletic assemblages of theropod dinosaurs. c
na (l)
Avialae (that is, birds) refers to all theropods closer to living birds than po (r) mpmx
to dromaeosaurids and troodontids (that is, a stem-based monophy- Orbit lc (r)
letic group containing Passer domesticus and all theropods closer AOF EN
ITF
to it than to Dromaeosaurus or Troodon). Neornithes represents the
crown group of birds and is the equivalent of Aves (sensu ref. 23); see
qj (r) pal (r) mx (r)
Supplementary Information for additional phylogenetic definitions. ju (r) jpmx pmx (r)
ect (r)
Further supporting information (such as interactive PDFs, matrices d to
and executable files) is available on DRYAD (https://doi.org/10.5061/
dryad.mkkwh70wg).
Fig. 1 | Cranium of the Cretaceous enantiornithine bird Falcatakely

Cranial osteology
forsterae (UA 10015, holotype). a, Photograph of the specimen, with a right
UA 10015 pertains to an enantiornithine bird (estimated cranial length, lateral view of the pre-orbital region (right side of the image) and a ventral view
8.5 cm) with a high, but extremely narrow, preorbital region (Fig. 1). of the palatal region (left side of the image). b, Digital polygon reconstruction
The lightly built face consists of a long, high edentulous maxilla and a from the microcomputed tomography scan of the specimen shown in a.
short, tooth-bearing premaxilla, forming a rostrum unlike that of any c, Digital polygon reconstruction of the specimen with most elements in b
known bird. The external nares are rostrally positioned and widely placed in near-life position in right lateral view. Scale bar, 1 cm (a–c).
separated from a large, parallelogram-shaped antorbital fenestra. The d, Reconstruction (not to scale) illustrating the preserved (in white) elements
premaxillae are fused rostrally and exhibit a short frontal process as in of the cranium. Left (l) and right (r) sides are indicated. AOF, antorbital fenestra;
some other enantiornithines5,24,25. The maxillary process is short and ect, ectopterygoid; EN, external nares; ITF, infratemporal fenestra; jpmx, jugal
slots into a V-shaped concavity on the maxilla (Fig. 1b, c and Extended process of the maxilla; ju, jugal; lc, lacrimal; mpmx, midline premaxilla; mx,
Data Figs. 1f, 2c). A single, conical, unserrated tooth is preserved in maxilla; na, nasal; pal, palatine; pmx, premaxilla; po, postorbital; pter,
pterygoid; qj, quadratojugal; sr, scleral ring; to, tooth.
the left premaxilla; the presence of additional premaxillary teeth is
uncertain owing to incomplete preservation.
The maxilla of Falcatakely is less than 1 mm thick and unique among
avialans in being extremely high and long, and forming at least 90% enantiornithines (for example, Parapengornis)10. The caudodorsal
of the reconstructed pre-orbital rostrum height (Fig. 1c). It is unfen- process is unique among avialans in morphology, being sub-triangular
estrated, a condition shared with Enantiornithes (Supplementary in shape, dimpled and pneumatic (Extended Data Fig. 1d). The ventral
Information), and lacks an antorbital fossa; it also preserves detailed ramus is longer than either the caudodorsal or rostrodorsal process,
vascular sulci over its surface that indicate the presence of an expansive and is extensively excavated, as in pengornithids and bohaiornith-
keratinous rhamphotheca (beak) in life26 (Fig. 1a; interactive PDFs can ids2,19,27. The ventral ramus terminates as a caudally expanded boot
be found at https://doi.org/10.5061/dryad.mkkwh70wg). The premax- that sits just dorsal to the overlapping portions of the jugal and maxilla
illary process is well-developed and forms part of the ventral border (Fig. 1c).
of the external naris. An elongate, tapering caudoventral projection The jugal is triradiate with a long, dorsoventrally restricted maxil-
contributes to the jugal bar, delimiting the ventral border of the orbit lary process, a distinct postorbital process and an extended, bar-like
where it underlies the lacrimal boot. quadratojugal process (Extended Data Fig. 1e). In contrast to most
The elongate nasals expand in width caudally and are unique in avialans21,28, the quadratojugal process is long and directed straight
possessing dorsomedially positioned fossae near the external nares caudally, forming the ventral border of the infratemporal fenestra.
(Extended Data Fig. 1b). The mid-portion of the nasals are broad and The quadratojugal process is not bifurcated as in most non-avialan
vaulted, as in bohaiornithid enantiornithines5,27 and express surface theropods and some phylogenetically early branching birds such as
dimpling on the lateral margin near the articulation with the lacrimal Sapeornis29. The right postorbital (Extended Data Fig. 1e) is represented
(Extended Data Fig. 1b–d). The reconstruction of Falcatakely was gener- only by its ventral process, which is flat and tapering, and is unlike any
ated using microcomputed tomography and reveals a nearly complete known among avialans or paravians in general30. At least 15 scleral ossi-
right lacrimal (Fig. 1c). The dorsal half of the element is T-shaped, as cles are present, enough to estimate the external diameter of the scleral
in avialans such as Archaeopteryx, Pengornis and Parapengornis2,27. ring to be 16–18 mm (Fig. 1c).
The rostrodorsal process is considerably longer than the caudo- The digitally reconstructed palate of UA 10015 (Extended Data Fig. 2)
dorsal process and is most similar to the condition in pengornithid reveals a substantial level of detail not typically observable in Mesozoic

Article
Pisciv
hus
Longusunguis
is
orena
Shenqiornis
ohaiorn
hync
Fortu
rnis
*
Zhouorn
s
Bohaiorni
Sulcavis
ngo
ntiorn
Pte
aeor
Liny
ngua
ura
Du
Parab
cha
yx
ryg
nh
Ne
izoo
iorn
Arch
Go norn is
ter
vis
is
is
vis
uan yx
orn
Jian
uq
is
bip
Lo agop
So ngs usa
ron
Sch
is
rn
ue
is
gia
no
ter
r
Ve
Vo
is
Ho gic
Eo
ha
t
rn
Pa
s
Pe
co rnis
go
pe
is
ng
rn
ng
lin
Ra is
o
or
ng
Lo pa ni rn is
ng xa s a no orn
iro Y n is
str vis x i a rav
Lo
ng av Yi civo
ipt is Ornithuromorpha s
ery Pi is
Bo x rav
luo
ch Ite sus
n
Qili ia Ga avis
Falc ana sar
atak Ap is
ely Enantiornithes h yorn
Sape Icht is
ornis o r n
Confu
ciusorn Enali venus
is dui rnis ad
Bapto
Jinzhouo varneri
rnis Baptornis
Confuciusornis san
ctus Hesperornis
Changchenornis Parahesperornis
Vegavis
Eoconfuciusornis
is Anas
Jeholorn
ryx Gallus
e o p te
Archa
Fig. 2 | Mosaic evolution of the avialan facial skeleton as depicted among lavender, lacrimal; blue, dentary. Illustrations of Archaeopteryx, Ichthyornis,
select early branching forms. Phylogenetic analysis places Facatakely among Hesperornis and Gallus were modified from a previous publication16.
enantiornithine birds. The illustration of Xinghaiornis(*) is placed near its See Supplementary Information for additional details for included taxa and
approximate position in the phylogeny based on a previous publication15. phylogenetic analyses.
Illustrations are not to scale. Red, premaxilla; green, maxilla; yellow, nasal;
avialans. The palatine is triradiate, with a long, thin rostral process Quantitative assessment of non-avialan and avialan (including Neor-
that abuts the maxilla (Extended Data Fig. 2a). The palatine does not nithes) facial shape demonstrates the combination of a derived cranial
contact the jugal and only modestly contacts the pterygoid, but shares phenotype in Falcatakely (that is, a neornithine-like expanded rostrum)
an elongate contact with the ectopterygoid. A dorsomedially directed formed by an underlying plesiomorphic paravian skeletal framework.
choanal process sweeps towards the midline to join its antimere. Only We used two-dimensional geometric morphometrics (Fig. 3) to com-
the thin rostral processes of the pterygoids are preserved in UA 10015; pare the maxillary and premaxillary shape in UA 10015 to that of a sam-
these processes are in close association with the palatines (Extended ple of fossil non-avialan theropods, as well as the crown birds Gallus
Data Fig. 2a). The ectopterygoid, an element that is unknown in most gallus (red junglefowl) and Nothoprocta pentlandii (Andean tinamou).
Cretaceous avialans24,31, is represented by a robust body and a thin, Principal component analysis reveals that species group together on the
elongate, uncinate process that contacts the jugal bar (Extended basis of the ratio of the maxillary to premaxillary size (the first principal
Data Fig. 2a). The vomers are represented by two thin, dorsoventrally component) and the ratio of the rostrocaudal length to dorsoventral
restricted laminar plates that extend rostrally between the two maxillae height of both elements (second principal component). Despite hav-
(Extended Data Fig. 2a, c). Thin sheets of bone are present just rostral ing maxillary and premaxillary proportions that are similar to those of
to the pterygoids, potentially representing the expanded caudal end non-avialan theropods (for example, paravians, oviraptorosaurs and
of the vomer, reminiscent of the condition in Gobipteryx24,31. ornithomimosaurs), Falcatakely exhibits an overall rostrum phenotype
that is convergent on a number of neornithine groups.
The configuration of the individual skeletal elements in Falcatakely
Mosaic evolution in the avian beak is more similar to the non-avialans Microraptor and Zanabazar than to
Our phylogenetic analyses recover Falcatakely nested within Enantio- ornithuromorphs (including neornithines) owing to the expanded max-
rnithes (Fig. 2 and Extended Data Figs. 3, 4). The long, deep and narrow illa and relatively small premaxilla. Nonetheless, the three-dimensional
rostrum of Falcatakely, dominated by an expanded maxilla, provides a shape of the pre-orbital facial skeleton closely resembles that of
stark contrast to the facial region formed by the premaxilla and maxilla some extant birds (Extended Data Figs. 5, 6), as assessed using
in other enantiornithines and more-crownward non-neornithines. Even three-dimensional geometric morphometrics to compare the shape
among rostrally elongated ornithothoracine taxa such as Longipteryx, of the maxilla, premaxilla and nasal within a sample of 349 extant
Longirostravis and Dingavis, this morphology is achieved through a birds32 (Supplementary Information). Principal component analy-
concomitant reduction in premaxillary and maxillary height as bones sis of the rostrum shape reveals that Falcatakely occupies a position
elongate along the rostrocaudal axis5,6,12,14,15. in whole-rostrum morphospace that is quantitatively similar to those

0.8 was not a fixed trait, at least among enantiornithine birds. Thus, con-
PC2 Non-avialan theropods
max. Avialae
solidation to a premaxilla-dominated rostrum, a hallmark of all living
Citipati Confuciusornithiformes birds, was not an evolutionary prerequisite for rostrum and, therefore,
Enantiornithes beak enlargement. More generally, this is consistent with a growing
0.6 Falcatakely
appreciation of the flexibility of the underlying developmental mecha-
Ornithuromorpha
Incisivosaurus Ornithurae nisms8,34,35 that may be responsible for the generation of convergent
Neornithes morphologies among distantly related forms. With Falcatakely, this
0.4 appreciation can now be extended to the deep-time avialan record.
The discovery of Falcatakely expands the ecomorphological potential
realized by enantiornithines and Mesozoic birds more generally3,18.
PC2 (30.2%)
Gallus This new appreciation of avialan anatomy underscores the potential

0.2
Sapeornis
for considerable variability in trophic ecology during the first great
Jeholornis diversification of the group during the Cretaceous period5,6,36,37.
Tsaagan Nothoprocta Confuciusornis
Velociraptor
Hesperornis
0 Zhouornis Yanornis
Zanabazar Falcatakely
Gansus Online content
Bohaiornithid
Archaeopteryx
Ichthyornis
Microraptor Pengornis
Xinghaiornis maries, source data, extended data, supplementary information,
–0.2 PC2 Haplocheirus Longipteryx
min. Struthiomimus acknowledgements, peer review information; details of author con-
Chanzuiornis
Rapaxavis tributions and competing interests; and statements of data and code
PC1
PC1 max. availability are available at https://doi.org/10.1038/s41586-020-2945-x.
min.
–0.4
–0.2 0 0.2 0.4
1. Xu, X. et al. An integrative approach to understanding bird origins. Science 346, 1253293
PC1 (35.7%)
(2014).
Fig. 3 | Geometric morphometric analyses of the facial shape of Falcatakely 2. Zhou, Z., Clarke, J. & Zhang, F. Insight into diversity, body size and morphological
evolution from the largest Early Cretaceous enantiornithine bird. J. Anat. 212, 565–577
among paravians. Plot of the first two principal components (PCs) of the
(2008).
two-dimensional landmark analysis of maxillary (blue line segments) and 3. Brusatte, S. L., O’Connor, J. K. & Jarvis, E. D. The origin and diversification of birds. Curr.
premaxillary (red line segments) morphology of select theropod taxa. The Biol. 25, R888–R898 (2015).
configuration of maxilla and premaxilla in Falcatakely is more similar to that of 4. O’Connor, J. K. in The Evolution of Feathers (eds Foth, C. & Rauhut, O. W. M.) 147–172
(Springer, 2020).
non-avialans in a two-dimensional analysis focused on fossil taxa, although the
5. O’Connor, J. K. & Chiappe, L. M. A revision of enantiornithine (Aves: Ornithothoraces) skull
overall three-dimensional rostrum phenotype occupies a morphospace morphology. J. Syst. Palaeontol. 9, 135–157 (2011).
converged on by subsequent radiations of neornithine birds (Supplementary 6. Huang, J. et al. A new ornithurine from the Early Cretaceous of China sheds light on the
Data). See Supplementary Information for analytical protocols. evolution of early ecological and cranial diversity in birds. PeerJ 4, e1765 (2016).
7. Bhullar, B.-A. S. et al. How to make a bird skull: major transitions in the evolution of the
avian cranium, paedomorphosis, and the beak as a surrogate hand. Integr. Comp. Biol.
56, 389–403 (2016).
of a number of unrelated neornithines, including members of the 8. Young, N. M. et al. Embryonic bauplans and the developmental origins of facial diversity
Ramphastidae (toucans), Phaethonidae (tropicbirds), Columbidae and constraint. Development 141, 1059–1063 (2014).
9. Mayr, G. Comparative morphology of the avian maxillary bone (os maxillare) based on an
(pigeons and doves) and Tyrannidae (tyrant flycatchers) (an interactive
examination of macerated juvenile skeletons. Acta Zool. 101, 24–38 (2020).
morphospace plot is included as Supplementary Data and at https:// 10. Hu, H., O’Connor, J. K. & Zhou, Z. A new species of Pengornithidae (Aves: Enantiornithes)
doi.org/10.5061/dryad.mkkwh70wg). from the Lower Cretaceous of China suggests a specialized scansorial habitat previously
unknown in early birds. PLoS ONE 10, e0126791 (2015).
The discovery of Falcatakely expands the realized cranial morphol-
11. Bailleul, A. M. et al. An Early Cretaceous enantiornithine (Aves) preserving an unlaid egg
ogy among known non-neornithine birds considerably. Analysis of its and probable medullary bone. Nat. Commun. 10, 1275 (2019).
three-dimensional anatomy shows that it is a stem bird that occupies a 12. Hou, L., Chiappe, L. M., Zhang, F. & Chuong, C.-M. New Early Cretaceous fossil from China
documents a novel trophic specialization for Mesozoic birds. Naturwissenschaften 91,
previously unrealized position in rostrum morphospace and potentially 22–25 (2004).
exploited an ecology that was not again seen until the diversification of 13. O’Connor, J. K. et al. Phylogenetic support for a specialized clade of Cretaceous
crown-group birds in the mid-Cenozoic era. A partial emancipation of enantiornithine birds with information from a new species. J. Vertebr. Paleontol. 29,
188–204 (2009).
the palate from the facial skeleton (that is, loss of jugal contact with the 14. O’Connor, J. K., Chiappe, L. M., Gao, C. & Zhao, B. Anatomy of the Early Cretaceous
palatine) concurrent with heretofore unappreciated rostrum elabora- enantiornithine bird Rapaxavis pani. Acta Palaeontol. Pol. 56, 463–475 (2011).
tion suggests that these regions are functionally and developmentally 15. O’Connor, J. K., Wang, M. & Hu, H. A new ornithuromorph (Aves) with an elongate rostrum
from the Jehol Biota, and the early evolution of rostralization in birds. J. Syst. Palaeontol.
integrated16,31,33. Notably, a mosaic pattern of palatal release is found 14, 939–948 (2016).
among stem avialans, at least insofar as the functional demands of the 16. Field, D. J. et al. Complete Ichthyornis skull illuminates mosaic assembly of the avian
rostrum or beak in Falcatakely appears to have required reinforce- head. Nature 557, 96–100 (2018).
17. Field, D. J., Benito, J., Chen, A., Jagt, J. W. M. & Ksepka, D. T. Late Cretaceous neornithine
ment of the connections to the rear of the face. These connections from Europe illuminates the origins of crown birds. Nature 579, 397–401 (2020).
are maintained through the ectopterygoid and with retention of the 18. Chiappe, L. M. & Witmer, L. M. Mesozoic Birds: Above the Heads of Dinosaurs (Univ.
robust postorbital linkage, despite the loss of the mid-face palatine California Press, 2004).
19. Li, Z., Zhou, Z., Wang, M. & Clarke, J. A. A new specimen of large-bodied basal
connection to the jugal bar. Such an arrangement was probably nec- enantiornithine Bohaiornis from the Early Cretaceous of China and the inference of
essary to stabilize the mid-portion of the cranium and the long, high feeding ecology in Mesozoic birds. J. Paleontol. 88, 99–108 (2014).
and extremely narrow rostrum. Although incomplete, the presence of 20. Wang, M., Hu, H. & Li, Z. A new small enantiornithine bird from the Jehol Biota, with
implications for early evolution of avian skull morphology. J. Syst. Palaeontol. 14, 481–497
a robust postorbital further indicates a rigidly enforced caudal region (2016).
of the cranium7. 21. Wang, M., O’Connor, J. K. & Zhou, Z. A new robust enantiornithine bird from the Lower
The maxilla-dominated facial skeleton of Falcatakely reveals two Cretaceous of China with scansorial adaptations. J. Vertebr. Paleontol. 34, 657–671
(2014).
important insights for the evolutionary history of birds. First, the ances- 22. Rogers, R. R., Hartman, J. H. & Krause, D. W. Stratigraphic analysis of Upper Cretaceous
tral developmental patterning of rostrum construction in basal avialans rocks in the Mahajanga Basin, northwestern Madagascar: implications for ancient and
has generated neornithine-like cranial phenotypes that have not been modern faunas. J. Geol. 108, 275–301 (2000).
23. Gauthier, J. A. & de Queiroz, K. in New Perspectives on the Origin and Early Evolution of
recognized in the fossil record until now. Second, the developmental Birds: Proceedings of the International Symposium in Honor of John H. Ostrom (eds
reduction of the maxilla previously inferred for Ornithothoraces7,9 Gauthier, J. & Gall, L. F.). 7–41 (Peabody Museum of Natural History, Yale Univ., 2001).

Article
24. Chiappe, L. M., Norell, M. A. & Clark, J. M. A new skull of Gobipteryx minuta (Aves: 32. Felice, R. N. & Goswami, A. Developmental origins of mosaic evolution in the avian
Enantiornithes) from the Cretaceous of the Gobi Desert. Am. Mus. Novit. 3346, 1–15 cranium. Proc. Natl Acad. Sci. USA 115, 555–560 (2018).
(2001). 33. Bhullar, B.-A. S. et al. A molecular mechanism for the origin of a key evolutionary
25. Wang, M. & Zhou, Z. A new enantiornithine (Aves: Ornithothoraces) with completely innovation, the bird beak and palate, revealed by an integrative approach to major
fused premaxillae from the Early Cretaceous of China. J. Syst. Palaeontol. 17, 1299–1312 transitions in vertebrate history. Evolution 69, 1665–1677 (2015).
(2019). 34. Mallarino, R. et al. Closely related bird species demonstrate flexibility between beak
26. Hieronymus, T. L. & Witmer, L. M. Homology and evolution of avian compound morphology and underlying developmental programs. Proc. Natl Acad. Sci. USA 109,
Rhamphothecae. Auk 127, 590–604 (2010). 16222–16227 (2012).
27. Wang, M., Zhou, Z.-H., O’Connor, J. K. & Zelenkov, N. V. A new diverse enantiornithine 35. Tokita, M., Yano, W., James, H. F. & Abzhanov, A. Cranial shape evolution in adaptive
family (Bohaiornithidae fam. nov.) from the Lower Cretaceous of China with information radiations of birds: comparative morphometrics of Darwin’s finches and Hawaiian
from two new species. Vert. Palasiat. 52, 31–76 (2014). honeycreepers. Phil. Trans. R. Soc. Lond. B 372, 20150481 (2017).
28. Wang, M. & Hu, H. A comparative morphological study of the jugal and quadratojugal in 36. Bell, A. & Chiappe, L. M. Statistical approaches for inferring ecology in Mesozoic birds.
early birds and their dinosaurian relatives. Anat. Rec. 300, 62–75 (2017). J. Syst. Palaeontol. 9, 119–133 (2011).
29. Wang, Y. et al. A previously undescribed specimen reveals new information on the 37. O’Connor, J. K. The trophic habits of early birds. Palaeogeogr. Palaeoclimatol. Palaeoecol.
dentition of Sapeornis chaoyangensis. Cretac. Res. 74, 1–10 (2017). 513, 178–195 (2019).
30. Hu, H., O’Connor, J. K., Wang, M., Wroe, S. & McDonald, P. G. New anatomical information
on the bohaiornithid Longusunguis and the presence of a plesiomorphic diapsid skull in Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
Enantiornithes. J. Syst. Palaeontol. 18, 1481–1495 (2020). published maps and institutional affiliations.
31. Hu, H. et al. Evolution of the vomer and its implications for cranial kinesis in Paraves.
Proc. Natl Acad. Sci. USA 116, 19571–19578 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods
Data availability
Temporal and stratigraphic context UA 10015 is catalogued into the collections at the Université
UA 10015 was recovered in 2010 at the locality MAD05-42 in the Beriv- d’Antananarivo. Details regarding the development of the digital
otra Study Area of the Mahajanga Basin Project. The bone-bearing files and the derivatives of these files (such as DICOM or PLY) used as
horizon lies within facies 2 of the Anembalemba Member in the Upper part of the study are included in the Supplementary Information and
Cretaceous (Maastrichtian) Maevarano Formation22 (Supplementary archived on the MorphoSource website (https://www.morphosource.
Information). Many specimens—including UA 10015—that were recov- org/Detail/ProjectDetail/Show/project_id/7894). Phylogenetic char-
ered from the Anembalemba Member were entombed by debris flows, acter information and parameters used in the analyses are provided
which often results in high-quality preservation with only minimal in the Supplementary Information. Executable files for phylogenetic
displacement and taphonomic distortion during burial38,39. analyses, character–taxon matrices, an interactive three-dimensional
morphospace plot and interactive three-dimensional PDFs are
Phylogenetic methods hosted on DRYAD (https://doi.org/10.5061/dryad.mkkwh70wg).
Given the extremely derived condition in Falcatakely and the notable This published study, including the novel genus (urn:lsid:zoobank.
amount of homoplasy among non-avialan paravians and basal avialans, org:act:5BA26059-B428-4896-BFEA-2475419C61FC) and species
we used a two-tiered dataset approach in an effort to best constrain the (urn:lsid:zoobank.org:act:69314771-F0D8-4C15-946C-524164385FB7)
phylogenetic affinities of Falcatakely (Supplementary Information). along with the associated nomenclatural acts, have been registered
First, we used the densely sampled, coelurosaur-wide matrix from the in ZooBank: urn:lsid:zoobank.org:pub:4595D69E-FE12-4DAD-B155-
Theropod Working Group (TWiG)40,41 to broadly assess and confirm 89F084254F73.
the position of Falcatakely among paravians (Extended Data Fig. 3
and Supplementary Information). Next, we used a modified version 38. Rogers, R. R. Fine-grained debris flows and extraordinary vertebrate burials in the Late
of a well-established Mesozoic avialan-focused (WEA) matrix25, along Cretaceous of Madagascar. Geology 33, 297–300 (2005).
39. Rogers, R. R., Krause, D. W., Curry Rogers, K., Rasoamiaramanana, A. H. & Rahantarisoa, L.
with previously described modifications16, to further examine the Paleoenvironment and paleoecology of Majungasaurus crenatissimus (Theropoda:
relationship of Falcatakely among avialans (Fig. 2 and Extended Data Abelisauridae) from the Late Cretaceous of Madagascar. J. Vertebr. Paleontol. 27, 21–31
Fig. 4). Bayesian inference trees were estimated for each dataset using (2007).
40. Brusatte, S. L., Lloyd, G. T., Wang, S. C. & Norell, M. A. Gradual assembly of avian body
MrBayes v.3.242. The standard model (Markov k-state variable model)43 plan culminated in rapid rates of evolution across the dinosaur–bird transition. Curr. Biol.
was specified with gamma-distributed rate variation44. A subset of char- 24, 2386–2392 (2014).
acters was set as ordered, following the previous use of the included 41. Turner, A. H., Makovicky, P. J. & Norell, M. A. A review of dromaeosaurid systematics and
paravian phylogeny. Bull. Am. Mus. Nat. Hist. 371, 1–206 (2012).
datasets. During the analysis, Markov chain Monte Carlo convergence 42. Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model
was assessed using the average standard deviation of split frequencies choice across a large model space. Syst. Biol. 61, 539–542 (2012).
43. Lewis, P. O. A likelihood approach to estimating phylogeny from discrete morphological
and by examining the trace files in Tracer45. Convergence to stationar-
character data. Syst. Biol. 50, 913–925 (2001).
ity was assumed for split frequencies below 0.01 and effective sample 44. Clarke, J. A. & Middleton, K. M. Mosaicism, modules, and the evolution of birds: results
size values >200. All analyses were performed with two runs of four from a Bayesian approach to the study of morphological evolution using discrete
character data. Syst. Biol. 57, 185–201 (2008).
chains each that were run for 10 million generations while sampling
45. Rambaut, A., Suchard, M. A., Xie, D. & Drummond, A. J. Tracer v.1.6. http://beast.
parameters every 1,000 generations. The first 25% of samples were community/tracer (2014).
discarded as burn-in. Results are summarized using a majority rule con- 46. O’Reilly, J. E. & Donoghue, P. C. J. The efficacy of consensus tree methods for
summarizing phylogenetic relationships from a posterior sample of trees estimated from
sensus (MRC) tree46. MRC trees for both datasets depict Falcatakely as
morphological data. Syst. Biol. 67, 354–362 (2018).
a member of Enantiornithes. The TWiG dataset recovers Falcatakely as 47. Goloboff, P. A., Farris, J. & Nixon, K. TNT, a free program for phylogenetic analysis.
the sister taxon to Pengornis, whereas the WEA matrix finds Falcatakely Cladistics 24, 774–786 (2008).
48. Goloboff, P. A., Farris, J. S. & Nixon, K. C. TNT: tree analysis using new technology. version
in a large polytomy with other enantiornithines (Extended Data Figs. 3,
1.1 (Willi Hennig Society Edition) http://www.lillo.org.ar/phylogeny/tnt/ (2008).
4). Given the denser avialan sampling in the WEA dataset, the phylo- 49. Goloboff, P. A. & Catalano, S. A. TNT version 1.5, including a full implementation of
genetic results from this matrix are used here as the primary results. phylogenetic morphometrics. Cladistics 32, 221–238 (2016).
Clade support was assessed using the estimated posterior probabilities
from the Bayesian inference trees. Morphological character support Acknowledgements We thank the Université d’Antananarivo, the Mahajanga Basin Project field
teams and the villagers of the Berivotra Study Area for support; the ministries of Mines, Higher
was established for the MRC trees using the map and apo commands
Education and Culture of the Republic of Madagascar for permission to conduct field research;
in TNT47–49. Additional details for the phylogenetic results and clade the National Geographic Society (8597-09) and the US National Science Foundation
support are presented in the Supplementary Information. (EAR–0446488, EAR–1525915, EAR–1664432) for funding; and M. Witton for drafting the line
drawings used in Fig. 1 and Extended Data Figs. 1, 2. Collection of avian three-dimensional
To further investigate the robustness of our inferred trees, three sen-
morphometric data was funded by European Research Council grant no. STG-2014-637171
sitivity analyses were performed examining the influence of cranial (to A. Goswami). Full acknowledgments are provided in the Supplementary Information.
versus postcranial data and of cranial-only character scorings for select
taxa (for example, Archaeopteryx and Sapeornis) on tree inference. Author contributions P.M.O., A.H.T. and J.R.G. designed the project; P.M.O., A.H.T., J.R.G., R.R.R.,
D.W.K. and L.J.R. conducted the fieldwork. J.R.G. performed the mechanical preparation of the
These analyses reveal no significant topological alterations relative to specimen; J.R.G. and P.M.O. conducted the digital preparation and interpretation of the
the standard analysis described above, lending support to the primary specimen using microcomputed tomography and carried out the rapid prototyping of
UA 10015; R.R.R. and L.J.R. provided geological data and taphonomic interpretation; P.M.O.,
results in which Falcatakely is placed among enantiornithine birds.
A.H.T., J.R.G. and R.N.F. completed the laboratory work on and digital representation of the
Moreover, additional explicit hypothesis testing using Bayes factor com- fossil and provided input on descriptions and comparisons; A.H.T. and P.M.O. contributed to
parisons was conducted with Falcatakely constrained to stemward posi- the character coding and phylogenetic analysis; R.N.F. completed the morphometric analyses;
P.M.O., A.H.T. and J.R.G. developed the manuscript, with contributions and/or editing from all
tions (for example, with Falcatakely excluded from Pygostylia), which
authors.
resulted in suboptimal solutions. Details for these analyses and the
specifics of the results are provided in the Supplementary Information; Competing interests The authors declare no competing interests.
executable files for the sensitivity and alternative hypothesis testing
are available on DRYAD (https://doi.org/10.5061/dryad.mkkwh70wg). Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
2945-x.
Reporting summary Correspondence and requests for materials should be addressed to P.M.O.
Peer review information Nature thanks Bhart-Anjan Bhullar and Daniel Field for their
Further information on research design is available in the Nature contribution to the peer review of this work.
Research Reporting Summary linked to this paper. Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | See next page for caption.

Extended Data Fig. 1 | Rostrum of the Cretaceous enantiornithine bird lateral face to highlight arrangement of the maxilla, lacrimal, jugal and
Falcatakely (UA 10015, holotype). a, Reconstruction (not to scale) illustrating postorbital (all elements from the right side). f, Digital polygon surface
the preserved (in white) elements of the cranium. b, Digital polygon surface reconstruction of left maxilla and premaxilla articulation (rostral to the left).
reconstruction (from microcomputed tomography scans) of the right nasal in AOF, antorbital fenestra; cdp, caudodorsal process of the lacrimal; cp, choanal
rostrodorsal view (caudal to the top) highlighting the midline depression and process of the palatine; ect, ectopterygoid; EN, external nares; ITF,
dimpled surface texture. c, Digital polygon surface reconstruction of the right infratemporal fenestra; fpn, frontal process of the nasal; inb, internarial bar;
nasal in dorsal view illustrating the dimpled architecture on the frontal and jpmx, jugal process of the maxilla; ju, jugal; lbo, lacrimal boot; lc, lacrimal; ld,
rostral portions, which extends laterally onto the lacrimal. d, Digital polygon lacrimal dimpling; le, lacrimal excavation; lf, lacrimal foramen; mpmx, midline
surface reconstruction of the right facial elements in right lateral view to premaxilla; mx, maxilla; mxpj, maxillary process of the jugal; na, nasal; nd,
illustrate the shape and inter-element relationships of the nasal, maxilla and nasal dimpling; nf, nasal fossa; nvs, neurovascular sulci; pal, palatine; pmpm,
lacrimal (note the surface texture of the right maxilla with neurovascular sulci premaxillary process of the maxilla; pmx, premaxilla; po, postorbital; qj,
broadly expressed over the lateral surface, deep to the inferred keratinous quadratojugal; rdp, rostrodorsal process of the lacrimal; rpn, rostral process of
covering (that is, beak)). e, Digital polygon surface reconstruction of the lower the nasal; tm, tomial margin; to, tooth; vr, ventral ramus of the lacrimal.
Article
Extended Data Fig. 2 | Palatal and lateral facial regions of the Cretaceous of the caudal margin (that is, the ventral ramus of the lacrimal) of the antorbital
enantiornithine bird Falcatakely (UA 10015, holotype). a, Digital polygon fenestra. Scale bar, 5 mm; the scale bar is representative for a and c; the
surface reconstruction (from microcomputed tomography scans) of the palate reconstruction in b is not to the same scale. AOF, antorbital fenestra; bs,
and lateral face in ventral view. b, Reconstructed outline drawing of Falcatakely basisphenoid rostrum; cp, choanal process of the (right) palatine; ect,
in palatal view (shaded regions are not preserved). c, Digital polygon surface ectopterygoid; EN, external nares; jpmx, jugal process of the maxilla; mpmx,
reconstruction of internal aspect of left facial skeleton (premaxilla, maxilla and midline premaxilla; mx, maxilla; na, nasal; pal, palatine; pmx, premaxilla; pter,
nasal) and palate in right lateral view. The left and right sides are indicated as (l) pterygoid; to, tooth; up, uncinate process of the ectopterygoid; vm, vomers.
and (r), respectively. The dashed line in c represents the approximate contour
Extended Data Fig. 3 | Majority- rule tree of Falcatakely among coelurosaurians from the Bayesian analysis of the TWiG matrix. Clades outside of the Avialae
are collapsed for brevity. Posterior probabilities are placed above the nodes.
Article
Extended Data Fig. 4 | Majority -rule tree of Falcatakely among avialans from the Bayesian analysis of a modified matrix that was previously published.
A matrix modified from a previous study25 was used. Posterior probabilities are placed above the nodes.
Extended Data Fig. 5 | Geometric morphometric analysis of rostrum shape (Fig. 3), the overall three-dimensional rostrum phenotype occupies the
in Falcatakely among avians. Plot of the first two principal components of the morphospace that is converged on by subsequent radiations of neornithine
three-dimensional landmark analysis of total rostrum shape of Falcatakely and birds (Supplementary Data). See Supplementary Information for analytical
extant avian taxa. Whereas the unique configuration of the maxilla and protocols.
premaxilla in Falcatakely is more similar to those of non-avialan paravians
Article
Extended Data Fig. 6 | Landmarking procedure for three-dimensional geometric morphometric analysis in dorsal and lateral views. a, Dorsal view.
b, Lateral view. Red spheres represent anatomical (type I) landmarks; yellow spheres are sliding semi-landmarks.
Corresponding author(s): Patrick M. O'Connor
Reporting Summary
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
n/a Confirmed

Software and code

Data collection Avizo 7.1 (VSG), 9.2 (FEI/Thermo-Fisher Scientific), and Avizo Lite 2019 (ThermoScientific); ImageJ v1.48 (National Institutes of Health);
Checkpoint (Stratovan); Blender v2.79b.
Data analysis MrBayes v3.2; TNT 1.5; Tracer v1.6; Geomorph (R package), Adams and Otárola-Castillo, 2013; StereoMorph (R package); Avizo 7 (VSG), 9
(FEI/Thermo-Fisher Scientific), and Avizo Lite 2019 (ThermoScientific); Animation Producer in Avizo; Adobe Acrobat Pro DC (Continuous
Release) Version 2020, Adobe Premiere Pro (Creative Cloud edition).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
April 2020
UA 10015 is cataloged into the collections at the Université d’Antananarivo. Details regarding digital file development and derivatives of files (e.g., DICOM, PLY) used
as part of the study are included in the Supplementary Information and archived on the MorphoSource website (https://www.morphosource.org/Detail/
ProjectDetail/Show/project_id/7894). Phylogenetic character information and parameters used in the analyses are provided in the Supplementary Information.
Executable files for phylogenetic analyses, interactive 3D morphospace plot, and interactive 3D PDFs are hosted on DRYAD: https://doi.org/10.5061/
dryad.mkkwh70wg. This published work, including the novel genus (urn:lsid:zoobank.org:act:5BA26059-B428-4896-BFEA-2475419C61FC) and species
1
(urn:lsid:zoobank.org:act:69314771-F0D8-4C15-946C-524164385FB7) along with the associated nomenclatural acts, have been registered in ZooBank:
urn:lsid:zoobank.org:pub:4595D69E-FE12-4DAD-B155-89F084254F73.

Ecological, evolutionary & environmental sciences study design

Study description Descriptive/comparative study of a new fossil bird (Falcatakely forsterae) from the Late Cretaceous of Madagascar.
Research sample This single cranium of Falcatakely (UA 10015) is the only known material of this taxon thus far discovered; it is known exclusively from
the Upper Cretaceous (Maastrichtian) of northwestern Madagascar.
Sampling strategy The study developed herein involves the description of a new taxon based on direct observation, light microscopy, and micro-
computed tomography of fossil represented by this single cranium. Digital preparation allowed for the a complete analysis and
reconstruction of individual elements of the cranium.
Data collection The holotype of Falcatakely forsterae (UA 10015) was collected from locality MAD05-42 by hand quarrying (ice pick, brush, rock
hammer), with subsequent emplacement in a plaster jacket prior to removal for laboratory processing. Mechanical and digital
preparation of the fossil was completed by J.R. Groenke, with interpretation of the anatomy (both of the fossil itself and digital
reconstructions/interpretations) by P.M. O'Connor, A.H. Turner, and J.R. Groenke. R.N. Felice led the morphometric analyses
included herein. Character scorings assessed by A.H. Turner and P.M. O'Connor, with phylogenetic analyses completed by A.H.
Turner.
Timing and spatial scale The specimen was originally collected during the 2010 calendar year, but only initially prepped and CT scanned (medical CT scanner
of the plaster jacket) that same year, yielding an ambiguous identification. J.R. Groenke (Ohio University) did additional mechanical
preparation in March 2017, immediately followed by a high-resolution microCT scan in April 2017. Intensive digital preparation then
ensued between April 2017 and January 2018, with subsequent, albeit intermittent, digital preparation, interpretation and
refinement of models through January 2019.
Data exclusions No data were excluded.
Reproducibility Not applicable; given that this paper focuses on a single specimen thus far known to humankind, it does not fall into the category for
being reproducible. However, the datasets assembled for this study are publicly available for future reanalyses by other workers.
Randomization Not Applicable.
Blinding Not applicable
Did the study involve field work? Yes No
Field work, collection and transport

Field conditions Fieldwork was conducted during the austral summer (i.e., the dry season) in 2010 in the Mahajanga Basin, near the village of
Berivotra, Madagascar.
Location The holotypic specimen was collected from the Upper Cretaceous Maevarano Formation, Mahajanga Basin, Madagascar.
Approximate coordinates: S 15 degrees, 54' 20.94", E 46 degrees, 35' 00.23"
Access & import/export The specimen was collected under a Collaborative Agreement with the University of Antananarivo and various ministries (Ministry of
Mines, Ministry of Higher Education) of the Madagascar government. Permits from the Ministry of Mines (Scientific Studies
Authorization No 005/2010) and the Ministry of Higher Education/University of Antananarivo (No 76 PAB/10, Supporting
documentation: - Scientific Authorization Studies No 007/2010, 005/2010, 006/2010, 009/2010) were used in support of field
research were issued on 17 June 2010 and 18 June 2010, respectively.
April 2020
Disturbance This study involved minimal disturbance to the environment, as the fossil-bearing layer was within 0.70 meters of the surface in the
locality.

2

Antibodies ChIP-seq
Palaeontology and archaeology MRI-based neuroimaging
Clinical data
Dual use research of concern
Palaeontology and Archaeology

Specimen provenance The holotype specimen (UA 10015) was recovered from locality MAD05-42 from the Upper Cretaceous Maevarano Formation,
Mahajanga Basin, Madagascar. Permits from the Ministry of Mines (Scientific Studies Authorization No 005/2010) and the Ministry of
Higher Education/University of Antananarivo (No 76 PAB/10, Supporting documentation: - Scientific Authorization Studies No
007/2010, 005/2010, 006/2010, 009/2010) were used in support of field research were issued on 17 June 2010 and 18 June 2010,
respectively.
Specimen deposition The holotype specimen of Falcatakely forsterae is reposited in the University of Antananarivo (UA), Madagascar with the collection
number UA 10015 .
Dating methods No new dates were obtained for this contribution; age constraint for the Maevarano Fm. is developed in Rogers et al. 2000.
Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.
Ethics oversight Fossil collection and exportation were completed in compliance with permits issued by the Ministry of Mines (United Republic of
Madagascar) and through a Collaborative Agreement with the University of Antananarivo and various ministries (Ministry of Mines,
Ministry of Higher Education) of the Madagascar government.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
April 2020
3
Article
Multiple wheat genomes reveal global

variation in modern breeding
https://doi.org/10.1038/s41586-020-2961-x Sean Walkowiak1,2,41, Liangliang Gao3,41, Cecile Monat4,41, Georg Haberer5,

Mulualem T. Kassa6, Jemima Brinton7, Ricardo H. Ramirez-Gonzalez7, Markus C. Kolodziej8,
Emily Delorean3, Dinushika Thambugala9, Valentyna Klymiuk1, Brook Byrns1,
Accepted: 9 September 2020 Heidrun Gundlach5, Venkat Bandi10, Jorge Nunez Siri10, Kirby Nilsen1,11, Catharine Aquino12,
Axel Himmelbach4, Dario Copetti13,14, Tomohiro Ban15, Luca Venturini16, Michael Bevan7,
Bernardo Clavijo17, Dal-Hoe Koo3, Jennifer Ens1, Krystalee Wiebe1, Amidou N’Diaye1,
Open access Allen K. Fritz3, Carl Gutwin10, Anne Fiebig4, Christine Fosker17, Bin Xiao Fu2,
Gonzalo Garcia Accinelli17, Keith A. Gardner18, Nick Fradgley18, Juan Gutierrez-Gonzalez19,
Check for updates
Gwyneth Halstead-Nussloch13, Masaomi Hatakeyama12,13, Chu Shin Koh20, Jasline Deek21,
Alejandro C. Costamagna22, Pierre Fobert6, Darren Heavens17, Hiroyuki Kanamori23,
Kanako Kawaura15, Fuminori Kobayashi23, Ksenia Krasileva17, Tony Kuo24,25, Neil McKenzie7,
Kazuki Murata26, Yusuke Nabeka26, Timothy Paape13, Sudharsan Padmarasu4,
Lawrence Percival-Alwyn18, Sateesh Kagale6, Uwe Scholz4, Jun Sese25,27, Philomin Juliana28,
Ravi Singh28, Rie Shimizu-Inatsugi13, David Swarbreck17, James Cockram18, Hikmet Budak29,
Toshiaki Tameshige15, Tsuyoshi Tanaka23, Hiroyuki Tsuji15, Jonathan Wright17, Jianzhong Wu23,
Burkhard Steuernagel7, Ian Small30, Sylvie Cloutier31, Gabriel Keeble-Gagnère32,
Gary Muehlbauer19, Josquin Tibbets32, Shuhei Nasuda26, Joanna Melonek30, Pierre J. Hucl1,
Andrew G. Sharpe20, Matthew Clark16, Erik Legg33, Arvind Bharti33, Peter Langridge34,
Anthony Hall17, Cristobal Uauy7, Martin Mascher4,35, Simon G. Krattinger8,36,
Hirokazu Handa23,37, Kentaro K. Shimizu13,15, Assaf Distelfeld38, Ken Chalmers34,
Beat Keller8, Klaus F. X. Mayer5,39, Jesse Poland3, Nils Stein4,40, Curt A. McCartney9 ✉,
Manuel Spannagl5 ✉, Thomas Wicker8 ✉ & Curtis J. Pozniak1 ✉
Advances in genomics have expedited the improvement of several agriculturally

important crops but similar efforts in wheat (Triticum spp.) have been more
challenging. This is largely owing to the size and complexity of the wheat genome1,
and the lack of genome-assembly data for multiple wheat lines2,3. Here we generated
ten chromosome pseudomolecule and five scaffold assemblies of hexaploid wheat to
explore the genomic diversity among wheat lines from global breeding programs.
Comparative analysis revealed extensive structural rearrangements, introgressions
from wild relatives and differences in gene content resulting from complex breeding
histories aimed at improving adaptation to diverse environments, grain yield and
quality, and resistance to stresses4,5. We provide examples outlining the utility of these
genomes, including a detailed multi-genome-derived nucleotide-binding leucine-rich
repeat protein repertoire involved in disease resistance and the characterization of
Sm16, a gene associated with insect resistance. These genome assemblies will provide
a basis for functional gene discovery and breeding to deliver the next generation of
modern wheat cultivars.
Wheat is a staple food across all parts of the world and is one of the bread wheat (Triticum aestivum), used for making bread and noodles.
most widely grown and consumed crops7. As the human population A, B and D in these designations correspond to separate subgenomes
continues to grow, wheat production must increase by more than 50% derived from three ancestral diploid species with similar but distinct
over current levels by 2050 to meet demand7. Efforts to increase wheat genome structure and gene content that diverged between 2.5 and
production may be aided by comprehensive genomic resources from 6 million years ago10. The large genome size (16 Gb for bread wheat),
global breeding programs to identify within-species allelic diversity and high sequence similarity between subgenomes and abundance of
determine the best allele combinations to produce superior cultivars2,8. repetitive elements (about 85% of the genome) hampered early wheat
Two species dominate current global wheat production: allotetra- genome-assembly efforts3. However, chromosome-level assemblies
ploid (AABB) durum wheat (Triticum turgidum ssp. durum), which have recently become available for both tetraploid11,12 and hexaploid
is used to make couscous and pasta9, and allohexaploid (AABBDD) wheat1,13. Although these genome assemblies are valuable resources,
*A list of affiliations appears at the end of the paper.

Article
they do not fully capture within-species genomic variation that can be Further investigation of orthologous groups indicated that 88.1% were
used for crop improvement, and comparative genome data from mul- unambiguous (clusters containing at most one member in each cultivar)
tiple individuals is still needed to expedite bread wheat research and (Extended Data Fig. 3c, Supplementary Table 5). Orthologous groups
breeding. Until now, comparative genomics of multiple bread wheat comprising exactly one gene in each line (‘complete’) were the most
lines have been limited to exome-capture sequencing4,5,14, low-coverage frequent (approximately 73.5% of genes per cultivar), suggesting strong
sequencing2 and whole-genome scaffolded assemblies13,15–17. Here we retention of orthologous genes within the ten RQAs. The residual genes
report multiple reference-quality genome assemblies and explore represented either singleton genes with no reciprocal best BLAST hits
genome variation that, owing to past breeder selection, differs greatly or genes located in complex clusters in at least one cultivar. Roughly
between bread wheat lines. These genome assemblies usher a new era 12% of genes showed PAVs, and their clustering resulted in relationships
for bread wheat and equip researchers and breeders with the tools (Fig. 1b) that were consistent with SNP-based phylogenetic similarities
needed to improve bread wheat and meet future food demands. (Fig. 1a). In addition, approximately 26% of the projected genes were
found in tandem duplications, indicating that CNV is a strong contribu-
tor of genetic variation in wheat.
Global variation in wheat genomes To provide an example of gene expansion on emerging breeding
To expand on the genome assembly of wheat for Chinese Spring1, we targets, we performed a more detailed analysis of the restorer of fertil-
generated ten reference-quality pseudomolecule assemblies (RQAs) ity (Rf) gene families (Supplementary Note 4). Rf genes are involved in
and five scaffold-level assemblies of hexaploid wheat (Supplemen- restoring pollen fertility in hybrid breeding programs23, and we iden-
tary Note 1, Supplementary Tables 1–3). For each RQA, we performed tified a previously undescribed clade within the mitochondrial tran-
de novo assembly of contigs (contig N50 > 48 kb) that were combined scription termination factor (mTERF) family (Supplementary Table 9),
into scaffolds (N50 > 10 Mb) spanning more than 14.2 Gb (Supplemen- which has recently been implicated in fertility restoration in barley24. Of
tary Note 1). The completeness of the genomes was supported by a note, this clade shows evolutionary patterns similar to those of Rf-like
universal single-copy orthologue (BUSCO) analysis that identified more pentatricopeptide repeat (PPR) proteins, representatives of which
than 97% of the expected gene content in each genome (Supplemen- are associated with Rf3, a major locus used in hybrid wheat breeding
tary Note 1). More than 94% of the scaffolds were ordered, oriented programs (Extended Data Fig. 4). Although wheat is currently not a
and curated using 10X Genomics linked reads and three-dimensional hybrid crop, there is substantial interest in Rf genes and their potential
chromosome conformation capture sequencing (Hi-C) to generate application in hybrid wheat production systems25. To our knowledge,
21 pseudomolecules, as done previously for wheat1,12 and barley (Hor- no Rf genes have been cloned in wheat and our analysis of Rf genes in
deum vulgare)18. The size and structure of the genomes were similar to multiple RQAs and identification of an Rf clade in wheat is an impor-
that of Chinese Spring, and we observed high collinearity between the tant step forward in tackling the challenges of hybrid wheat breeding.
pseudomolecules (Extended Data Fig. 1). We also independently vali-
dated the scaffold placement and orientation in the pseudomolecule
assembly of CDC Landmark by Oxford Nanopore long-read sequencing The wheat NLR repertoire
(Extended Data Fig. 2a, Supplementary Note 2). To complement the To further exemplify the use of multi-genome comparisons for char-
RQAs, we generated scaffold-level assemblies of five additional bread acterizing agronomically relevant gene families, we examined gene
wheat lines (Supplementary Note 1). To determine the global context expansion in nucleotide-binding leucine-rich repeat (NLR) proteins,
of the 15 assemblies, we combined our data with existing datasets4,5,19 which are major components of the innate immune system and are
(Fig. 1a, Supplementary Table 4). The genetic relationships were in often causal genes for disease resistance in plants26,27. We performed
agreement with those reported in previous studies4,5 and reflected de novo annotation of loci that contain conserved NLR motifs (NB-ARC–
pedigree, geographical location and growth habit (that is, spring ver- leucine-rich repeat) and identified around 2,500 loci with NLR signa-
sus winter type). There was also a clear separation between the newly tures in each RQA (Supplementary Tables 10, 11). A redundancy analysis
assembled genomes and Chinese Spring, supporting that they capture showed that only 31–34% of the NLR signatures are shared across all
geographical and historical variation not represented in the Chinese genomes, and the number of unique signatures ranged from 22 to 192
Spring assembly. per wheat cultivar. We estimated the number of unique NLR signatures
that can be detected by incrementally adding more wheat genomes to
the dataset; this revealed that 90% of the NLR complement is reached
Polyploidy and CNV drive gene diversification at between 8 (considering 95% sequence identity) and 11 wheat lines
Single-nucleotide polymorphisms (SNPs), insertions or deletions (considering 100% protein sequence identity) (Fig. 1c). The total NLR
(indels), presence/absence variation (PAV) and gene copy number varia- complement of all wheat lines consisted of 5,905 (98% identity) to 7,780
tion (CNV) influence agronomically important traits. This is particularly (100% identity) unique NLR signatures, highlighting the size and com-
true for polyploid species such as wheat, in which gene redundancy plexity of the repertoire of receptors involved in disease resistance.
can buffer the effect of genome variation17. To assess gene content, we
projected around 107,000 high-confidence gene models from Chinese
Spring1 onto the RQAs (Supplementary Note 3). The total number of Transposon signatures identify introgressions
projected genes exhibited a narrow range, between 118,734 and 120,967 Transposable elements make up a large majority of the wheat genome
(Supplementary Table 5). We identified orthologous groups among and have a critical role in genome structure and gene regulation. We
projected genes and used the alignment of the orthologous groups to characterized the overall transposable element content (81.6%) and its
examine SNPs in coding sequences (Supplementary Note 3). The peak composition (69% long terminal-repeat retrotransposons (LTR) and
positions of nucleotide diversity across the three subgenomes were 12.5% DNA transposons) in the RQAs (Supplementary Table 5). Across all
highly similar to those reported in previous studies20, supporting a RQAs, we annotated 1.22 × 106 full length (fl)-LTRs, which clustered lines
strong representation of breeding diversity within the RQAs (Extended into the same groups we observed from our analysis of PAV and SNPs
Data Fig. 3a, b). The correlation of synonymous nucleotide diversity π (Fig. 1a, b, Extended Data Fig. 3d). Generally, unique fl-LTRs (147,450)
(r = 0.11–0.29) and Tajima’s D (r = 0.02–0.06) between homeologues were young (median of 0.9 million years) and were enriched in the
was low (Supplementary Tables 6–8). This suggested that polyploidiza- highly recombining, more distal chromosomal regions (Fig. 1d). By con-
tion increased the number of targets of selection and contributed to trast, shared fl-LTRs were older (median of 1.3 million years) and were
broad adaptation of bread wheat, as in wild polyploid plant species20–22. more evenly distributed across the pericentric regions (Fig. 1d). The

a Principal component 1 b Similarity in PAV
Norin 61 88 92
Robigus Julius
LongReach Lancer
SY Mattis
Mace
Jagger Chinese Spring
Claire
Norin 61
Principal component 2
ArinaLrFor Jagger
Chinese Julius
Spring Paragon
ArinaLrFor
Cadenza
SY Mattis
Weebill 1 LongReach Lancer
PI190962 (spelt wheat)
CDC Stanley
CDC Stanley
CDC Landmark
CDC Landmark
PI190962
(spelt wheat) Mace
c d LTR-retrotransposon density
Sequence
Low High
identity
8,000 2 Unique
100% 0
2 2
0
2 3
0
2
Unique NLRs
Insertion time (Myr)

98% 0
Number of lines
2
0 5
5,000 2
95% 6
0
2
0 7
2
0 8
2
0 9
2
0 10
2,000 2
0 11
0 5 10 15 0 25 50 75 100
Number of lines Chromosomal location (% length)
Fig. 1 | Patterns of variation in the wheat genome. a, Principal component of genomes increases. Dashed vertical lines represent 90% of the NLR
analysis of polymorphisms from exome-capture sequencing of about 1,200 complement. Markers indicate the mean values of all permutations of the order
lines (grey markers), 16 lines from whole-genome shotgun resequencing of adding genomes. Whiskers show maximum and minimum values based on
(orange markers) and our new assemblies (black markers). Text colours reflect one million random permutations. d, Chromosomal location versus insertion
different geographical locations and winter or spring growth. b, Dendrogram age distribution of unique to (reading downward) increasingly shared syntenic
of pairwise Jaccard similarities for gene PAV between all RQA assemblies. full-length LTR retrotransposons.
c, Number of unique NLRs at different per cent identity cut-offs as the number
RLC-Angela fl-LTRs were the most abundant (21,000–27,000 full-length patterns were uniquely associated with a single genome (Supplemen-
copies per genome) and analysis of variant patterns identified several tary Tables 13–16). The majority of unique regions were in PI190962
chromosomal segments that contained numerous unique or rare ret- (spelt wheat; Triticum aestivum ssp. spelta), which was expected, given
rotransposon insertions (Extended Data Fig. 5), which, on the basis that it diverged from modern bread wheat several thousand years ago.
of breeding history, we hypothesize to represent introgressions. For A similar strategy was used to confirm RLC-Angela variation at the
example, the LongReach Lancer RQA revealed two unique regions, a telomeric region of chromosome 2A in Jagger, Mace, SY Mattis and
pericentric region on chromosome 2B and a segment on the end of CDC Stanley (Fig. 2c), which corresponds to the 2NvS introgression
chromosome 3D (Fig. 2a, b), both of which affect chromosome length from Aegilops ventricosa (Supplementary Note 5). This introgression
(Extended Data Fig. 5). We used pedigree analysis to postulate the is a well-known source of resistance to wheat blast30, and contains the
source of the introgressions and performed whole-genome sequenc- Lr37–Yr17–Sr38 gene cluster, which provides resistance to several rust
ing of multiple accessions of putative donors. LongReach Lancer carries diseases31. Sequencing of A. ventricosa accessions (Supplementary
the stem rust resistance gene Sr36, derived from an introgression from Table 12) followed by comparison of chromosomes with the RQAs con-
Triticum timopheevii, and the resistance genes Lr24 (leaf rust) and Sr24 firmed that Jagger, Mace, SY Mattis and CDC Stanley carry the 2NvS
(stem rust), derived from tall wheatgrass28,29 (Thinopyrum ponticum). introgression, which spans about 33 Mb on chromosome 2A (Fig. 2c,
We generated whole-genome sequence reads from multiple T. ponticum Extended Data Fig. 6a). We annotated the coding genes within this
and T. timopheevii accessions (Supplementary Table 12) and alignment region and identified 535 high-confidence genes; more than 10% were
to the LongReach Lancer RQA confirmed a T. ponticum introgression predicted to be associated with disease resistance, including genes that
spanning a region of approximately 60 Mb of chromosome 3D (Fig. 2a), encode putative NB-ARC and NLRs (Extended Data Fig. 6b, Supplemen-
whereas T. timopheevii aligned to the majority (427 Mb) of chromo- tary Tables 17, 18). Furthermore, we used genotyping by sequencing to
some 2B (Fig. 2b). Overall, we identified 341 chromosomal segments detect the 2NvS segment in three wheat panels and discovered that its
larger than 20 Mb with unique or rare fl-LTR insertion patterns that frequency has been increasing in breeding germplasm and its pres-
were present in only 1 to 4 of the RQA genomes, of which 273 insertion ence is consistently associated with higher grain yield (Extended Data

Article
a LongReach Lancer chromosome 3D e Chinese Spring (5B/7B non-carrier)
i 762 325 305 1
7BL 7BS
ii
iii Translocation breakpoint
iv
5BS 5BL
T. ponticum 1 166 222 737
b LongReach Lancer chromosome 2B

1 428 480 993
i 7BL 5BL
ii
iii 5BS 7BS
iv 1 157 174 488
ArinaLrFor (5B/7B carrier)
T. timopheevii
c Jagger chromosome 2A f SY Mattis cytology (5B/7B carrier)

i
ii 1B
7D 7B
iii 7A
7BL/5BL 3D
iv
2B
4D 2D
A. ventricosa
6B 5D
No. of lines Min. Max. 7BS/5BS
1 2 3 4
3A 3B
RLC_Angela Read depth 10 μm
d 500 g
SY Mattis Hi-C Norin 61 Hi-C
Julius chromosome 4D
400 (5B/7B carrier) (5B/7B non-carrier)

400
position (Mb)
Chromosome 7B (Mb)
300
350
Centromere
200 shift
300
100
250
0
200
0 100 200 300 400 500 100 120 140 160 180 200 100 120 140 160 180 200
Chinese Spring chromosome 4D Chromosome 5B (Mb)
position (Mb)
Fig. 2 | Introgressions and large-scale structural variation in wheat. wheat wild relatives (blue–yellow heat map; legend at bottom). d, Dot plot
a–c, T. ponticum introgression on chromosome 3D in LongReach Lancer (a), alignment showing chromosome-level collinearity (black) with relative density
T. timopheevi introgression on chromosome 2B in LongReach Lancer (b) and of CENH3 ChIP–seq mapped to 100-kb bins for Chinese Spring (blue) and Julius
A. ventricosa introgression on chromosome 3D in Jagger (c). Track i, map of (red); the arrow indicates a centromere shift. e, Robertsonian translocation
polymorphic RLC-Angela retrotransposon insertions (legend at bottom); track between chromosomes 5B and 7B in ArinaLrFor. f, g, Cytology (f) and Hi-C (g)
ii, density of projected gene annotations from Chinese Spring (blue bars, confirm the 5B/7B translocation in SY Mattis (left) compared with the
scaled to maximum value); track iii, per cent identity to Chinese Spring based non-carrier Norin 61 (right). In f, five independent cells were observed; the
on chromosome alignment (yellow; scale is 0–100%); track iv, read depth of translocation was confirmed independently ten times. Scale bar, 10 μm.
Fig. 6c, d, Supplementary Tables 19, 20). Of note, we identified about CENH3 chromatin immunoprecipitation and sequencing (ChIP–seq)35
60 genes belonging to the cytochrome P450 superfamily, which have to determine the positions and sizes (about 7.5–9.6 Mb) of the cen-
been implicated in abiotic and biotic stress tolerance32 and have been tromeres for each RQA (Supplementary Tables 21, 22), which were
functionally validated to influence grain yield in wheat33. Together, consistent with previous estimates for wheat1. Furthermore, all chro-
these data indicate that the modern wheat gene pool contains many mosomes showed a single active site, implying that previous reports
chromosomal segments of diverse ancestral origins, which can be iden- of multiple active centromeres in Chinese Spring1 were artefacts of
tified by their transposable-element signatures. We also confirmed the misoriented scaffolds. However, we found examples in which the rela-
wild-relative origins of three introgressions within the RQA assemblies— tive position of the centromere was shifted owing to several pericentric
a first step towards characterizing causal genes for breeding targets, inversions, including inversions on chromosomes 4B and 5B (Extended
such as resistance to wheat blast and rust fungi. Data Fig. 7a, b). We also observed one instance in which the centro-
meric position changed, but was not associated with a structural event.
Specifically, on chromosome 4D in Chinese Spring, the centromere is
Centromere dynamics shifted by around 25 Mb relative to the consensus position (Fig. 2d).
Centromeres are vital for cell division and chromosome pairing during This shift was previously recognized by cytology but was hypothesized
meiosis. In plants, functional centromeres are defined by the epige- to result from a pericentric inversion36. However, the high degree of
netic placement of the modified histone CENH334. We therefore used collinearity between genomes supports the hypothesis that Cen4D in

a b CDC Landmark c
15.7 Mb
17.0 Mb
Paragon
Robigus Sm1
Mace
Adult
Claire Chinese Spring

CDC Stanley (Sm1 non-carrier)
15.2 Mb
15.7 Mb
Chinese Spring
Weebill 1
Norin 61 CDC Landmark
Cadenza (Sm1 carrier)
Julius
ArinaLrFor
Larvae
LongReach Lancer Haplotypes

SY Mattis
Jagger 1 Sm1 carrier
2 Sm1 non-carrier
0 Mb 800 Mb
3 Sm1 non-carrier
5 Mb Sm1 25 Mb
Healthy
1 Sm1 carrier
CDC Landmark
G182R
Paragon
W 98*
Robigus
Mace 2 Sm1 non-carrier
Claire
CDC Stanley
Chinese Spring 3 Sm1 non-carrier
Weebill 1 (that is, Waskada)
Damaged
Norin 61
Cadenza
Julius NB-ARC LRR S/T kinase MSP
ArinaLrFor
LongReach Lancer Transmembrane Mutations Alternative haplotype
SY Mattis
Jagger
Fig. 3 | Cloning of the gene Sm1. a, The orange wheat blossom midge oviposits surrounding Sm1 (teal). c, Top, anchoring of the Sm1 fine map to the physical
eggs on wheat spikes and the larvae feed on developing wheat grains, resulting maps of Chinese Spring and CDC Landmark and graphical genotypes of three
in moderate to severe damage to mature kernels. b, Top, sections of haplotypes critical to localizing the Sm1 candidate gene. Bottom, annotation of
chromosome 2B of the same colour in the same position share haplotypes the Sm1 candidate gene, which encodes NB-ARC and LRR motifs in addition to
(based on 5-Mb bins), with the exception of those in grey, which indicates a the integrated serine/threonine (S/T) kinase and MSP domains. Two
line-specific haplotype. The position of Sm1 is indicated with respect to the independent ethyl-methanesulfonate-induced mutations (W98* and G182R)
CDC Landmark assembly. Bottom, zoomed-in view of haplotype blocks (based result in loss of function and susceptibility to the orange wheat blossom midge
on 250-kb bins) from 5 to 25 Mb positions on chromosome 2B, surrounding (light blue lines). An alternative haplotype was observed in the kinase region of
Sm1. CDC Landmark, Robigus and Paragon all carry the same haplotype Waskada (black).
Chinese Spring has shifted to a non-homologous position; this shifting represent most of the UK wheat gene pool grown since the 1920s41.
of centromeres to non-homologous sites has also been reported in The translocation occurred in 66% of the lines and was selectively neu-
maize37. By characterizing the centromere positions for these diverse tral (Supplementary Note 7). Notably, the Ph1 locus on chromosome
wheat lines, we provide strong evidence for changes in centromere 5B, which controls the pairing of homeologous chromosomes during
position caused by structural rearrangements and centromere shifts. meiosis42, is near the translocation breakpoint, but remained highly
syntenic between translocation carriers and non-carriers. Genetic
mapping and analysis of short-read sequencing data indicated that
Large-scale structural variation between genomes the 5B/7B translocated chromosomes recombine freely with 5B and 7B
Structural variants are common in wheat38, and impact genome struc- chromosomes (Extended Data Fig. 9d), suggesting that chromosome
ture and gene content. We characterized large structural variants pairing is not affected by the translocation.
using pairwise genome alignments (Extended Data Fig. 1), changes in
three-dimensional topology of chromosomes revealed by Hi-C confor-
mation capture directionality biases along the genome39,40 (Extended Haplotype-based gene mapping
Data Fig. 8, Supplementary Table 23), which were confirmed by Oxford To develop improved wheat cultivars, breeders shuffle allelic vari-
Nanopore long-read sequencing (Extended Data Fig. 2) and cytological ants by making targeted crosses and exploiting the recombination
karyotyping (Extended Data Fig. 7c, Supplementary Table 24, Sup- that occurs during meiosis. These alleles, however, are not inherited
plementary Note 6). The most prominent event was a translocation independently, but rather as haplotype blocks that often extend
between chromosomes 5B and 7B, observed in ArinaLrFor, SY Mattis across multiple genes that are in genetic linkage43,44. We quantified
(Fig. 2e–g) and Claire. Normally, chromosomes 5B and 7B are approxi- haplotype variation along chromosomes across the assemblies, and
mately 737 and 762 Mb long, respectively, and we estimated that the developed visualization software to support its utility (Supplemen-
recombined chromosomes are 488 Mb (5BS/7BS) and 993 Mb (7BL/5BL) tary Note 8). We used these haplotypes to characterize a locus that
long, making 7BL/5BL the largest wheat chromosome (Extended Data provides resistance to the orange wheat blossom midge (OWBM, Sito-
Fig. 9a). In ArinaLrFor and SY Mattis, the 7BL/5BL breakpoint resides diplosis mosellana Géhin), one of the most damaging insect pests of
within an approximately 5-kb GAA microsatellite, which we were wheat, which is endemic in Europe, North America, west Asia and the
able to span using polymerase chain reaction (PCR) (Extended Data Far East. Upon hatching, the first-instar larvae feed on the developing
Fig. 9b, c). By contrast, the breakpoint on 5BS/7BS was less syntenic, grains and damage the kernels (Fig. 3a). Sm1 is the only gene in wheat
and we detected polymorphic fluorescence in situ hybridization signals known to provide resistance to OWBM6. CDC Landmark, Robigus and
between ArinaLrFor and SY Mattis on the 5BS portion of the translo- Paragon are all resistant to the OWBM, and all three carry the same
cated chromosome segment, suggesting that the regions adjacent to 7.3-Mb haplotype within the Sm1 locus on chromosome 2B (Fig. 3b).
the translocation events differ on 5BS/7BS (Supplementary Note 6). To identify Sm1 gene candidates, we used high-resolution genetic
To determine the stability of the translocation in breeding, we geno- mapping and refined the locus to a 587-kb interval in the CDC Land-
typed for the translocation event in a panel of 538 wheat lines that mark RQA (Fig. 3c, Extended Data Fig. 10a, Supplementary Table 25).

Article
Through extensive genotyping of diverse breeding lines, we found an
OWBM-susceptible line, Waskada, that displayed a resistant haplo- Online content
type except near one gene, which we annotated in CDC Landmark to Any methods, additional references, Nature Research reporting sum-
encode a canonical NLR with kinase and major sperm protein (MSP) maries, source data, extended data, supplementary information,
integrated domains (Fig. 3c). Oxford Nanopore long-read sequenc- acknowledgements, peer review information; details of author con-
ing further confirmed the structure of the gene in CDC Landmark tributions and competing interests; and statements of data and code
(Extended Data Fig. 10b). By contrast, the remaining assemblies (sus- availability are available at https://doi.org/10.1038/s41586-020-2961-x.
ceptible to OWBM) lacked the NB-ARC domain, but the kinase and MSP
domains remained intact (Fig. 3c). We sequenced the Waskada allele 1. The International Wheat Genome Sequencing Consortium. Shifting the limits in wheat
and found it contains the NB-ARC domain, but an alternative haplotype research and breeding using a fully annotated reference genome. Science 361, eaar7191
(2018).
within the kinase domain (Fig. 3c, Extended Data Fig. 10c). This gene 2. Montenegro, J. D. et al. The pangenome of hexaploid bread wheat. Plant J. 90, 1007–1013
is expressed in wheat kernels and seedlings of Sm1 carrier lines, and (2017).
the lack of cDNA amplification of the NB-ARC domain for non-carrier 3. International Wheat Genome Sequencing Consortium (IWGSC). A chromosome-based
draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345,
lines further supported an alternative gene structure (Extended Data 1251788 (2014).
Fig. 10c). We generated two knockout-mutant lines of this candidate 4. He, F. et al. Exome sequencing highlights the role of wild-relative introgression in shaping
gene in the Sm1 carrier line Unity45, and both were consistently rated the adaptive landscape of the wheat genome. Nat. Genet. 51, 896–904 (2019); correction
51, 1194 (2019).
as susceptible to OWBM (Supplementary Table 26). Sequencing of the 5. Pont, C. et al. Tracing the ancestry of modern bread wheats. Nat. Genet. 51, 905–911
candidate gene in these two mutants revealed a single point mutation (2019).
in each line: a G>A mutation resulting in a Gly>Arg (G182R) amino acid 6. Kassa, M. T. et al. A saturated SNP linkage map for the orange wheat blossom midge
resistance gene Sm1. Theor. Appl. Genet. 129, 1507–1517 (2016).
substitution in the NB-ARC domain, and a G>A mutation, resulting in 7. Tadesse, W. et al. Genetic gains in wheat breeding and its role in feeding the world. Crop
a stop codon (W98*) before the NB-ARC domain (Fig. 3c). The kinase Breed. Genet. Genom. 1, e190005 (2019).
domain encoded by Sm1 belongs to the serine/threonine class46, similar 8. Zhao, Q. et al. Pan-genome analysis highlights the extent of genomic variation in
cultivated and wild rice. Nat. Genet. 50, 278–284 (2018).
to those of Rpg5, which provides stem rust resistance47, and Tsn1, which 9. Dubcovsky, J. & Dvorak, J. Genome plasticity a key factor in the success of polyploid
encodes sensitivity to the necrotrophic effector ToxA produced by Par- wheat under domestication. Science 316, 1862–1866 (2007).
10. Marcussen, T. et al. Ancient hybridizations among the ancestral genomes of bread wheat.
astagonospora nodorum and Pyrenophora tritici-repentis48; however,
Science 345, 1250092 (2014).
both Rpg5 and Tsn1 lack the MSP domain. To our knowledge, this is the 11. Avni, R. et al. Wild emmer genome architecture and diversity elucidate wheat evolution
first report of an NB-ARC-LRR-kinase-MSP coding gene associated with and domestication. Science 357, 93–97 (2017).
12. Maccaferri, M. et al. Durum wheat genome highlights past domestication signatures and
insect resistance. Additional research is needed to functionally validate
future improvement targets. Nat. Genet. 51, 885–895 (2019).
these domains and their putative role in OWBM resistance using tools 13. Zimin, A. V. et al. The first near-complete assembly of the hexaploid bread wheat
such as gene editing. Nevertheless, we developed a high-throughput genome, Triticum aestivum. Gigascience 6, 1–7 (2017).
14. Winfield, M. O. et al. Targeted re-sequencing of the allohexaploid wheat exome. Plant
and low-cost competitive allele-specific PCR marker (KASP) that dis-
Biotechnol. J. 10, 733–742 (2012).
criminates between OWBM-susceptible and OWBM-resistant lines with 15. Arora, D., Gross, T. & Brueggeman, R. Allele characterization of genes required for
perfect accuracy (Extended Data Fig. 10d, Supplementary Table 27). rpg4-mediated wheat stem rust resistance identifies Rpg5 as the R gene. Phytopathology
103, 1153–1161 (2013).
Our analyses, along with the haplotype and synteny viewers (https://
16. Adamski, N. M. et al. A roadmap for gene functional characterisation in crops with large
kiranbandi.github.io/10wheatgenomes/, http://10wheatgenomes. genomes: lessons from polyploid wheat. eLife 9, e55646 (2020).
plantinformatics.io/ and http://www.crop-haplotypes.com/), laid 17. Uauy, C. Wheat genomics comes of age. Curr. Opin. Plant Biol. 36, 142–148 (2017).
18. Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley
the foundation for identifying haplotypes for Sm1. Haplotypes can
genome. Nature 544, 427–433 (2017).
now be genotyped in breeding programs using single-marker or 19. Edwards, D. et al. Bread matters: a national initiative to profile the genetic diversity of
high-throughput-sequencing-based approaches, which can integrate Australian wheat. Plant Biotechnol. J. 10, 703–708 (2012).
20. Jordan, K. W. et al. A haplotype map of allohexaploid wheat reveals distinct patterns of
desirable genes into improved cultivars more efficiently. selection on homoeologous genomes. Genome Biol. 16, 48 (2015).
21. Paape, T. et al. Patterns of polymorphism and selection in the subgenomes of the
allopolyploid Arabidopsis kamchatica. Nat. Commun. 9, 3909 (2018).
Discussion 22. Paape, T. et al. Conserved but attenuated parental gene expression in allopolyploids:
Constitutive zinc hyperaccumulation in the allotetraploid Arabidopsis kamchatica. Mol.
We have built on the genome-sequence resources available for wheat Biol. Evol. 33, 2781–2800 (2016).
and related species to produce ten RQAs and five scaffolded assem- 23. Melonek, J., Stone, J. D. & Small, I. Evolutionary plasticity of restorer-of-fertility-like
proteins in rice. Sci. Rep. 6, 35152 (2016).
blies that represent hexaploid wheat lines from different regions, 24. Bernhard, T., Koch, M., Snowdon, R. J., Friedt, W. & Wittkop, B. Undesired fertility
growth habits and breeding programs1,11,12,18,20,49. We have identified restoration in msm1 barley associates with two mTERF genes. Theor. Appl. Genet. 132,
and characterized SNPs, PAV, CNV, centromere shifts, large-scale 1335–1350 (2019).
25. Whitford, R. et al. Hybrid breeding in wheat: technologies to improve hybrid wheat seed
structural variants and introgressions from wild relatives of wheat production. J. Exp. Bot. 64, 5411–5428 (2013).
that can be used to identify and characterize important breeding 26. Keller, B., Wicker, T. & Krattinger, S. G. Advances in wheat and pathogen genomics:
targets. This was complemented by a transposable-element-analysis Implications for disease control. Annu. Rev. Phytopathol. 56, 67–87 (2018).
27. Steuernagel, B. et al. Rapid cloning of disease-resistance genes in plants using
approach to identify candidate introgressions from wild relatives of mutagenesis and sequence capture. Nat. Biotechnol. 34, 652–655 (2016).
wheat, for which we provided high-quality assemblies of segments 28. Bariana, H. S. et al. Mapping of durable adult plant and seedling resistances to stripe rust
already used in global breeding programs. Together, these RQAs and stem rust diseases in wheat. Aust. J. Agric. Res. 52, 1247–1255 (2001).
29. Chemayek, B. et al. Tight repulsion linkage between Sr36 and Sr39 was revealed by
present an opportunity for breeders and researchers to perform genetic, cytogenetic and molecular analyses. Theor. Appl. Genet. 130, 587–595
high-resolution manipulation of genomic segments and pave the (2017).
way to identifying genes responsible for in-demand traits, as we 30. Cruz, C. D. et al. The 2NS translocation from Aegilops ventricosa confers resistance to the
Triticum pathotype of Magnaporthe oryzae. Crop Sci. 56, 990–1000 (2016).
demonstrated for resistance to the insect pest OWBM. Functional 31. Helguera, M. et al. PCR assays for the Lr37-Yr17-Sr38 cluster of rust resistance genes and
gene studies will also be facilitated by comparative gene analyses, their use to develop isogenic hard red spring wheat lines. Crop Sci. 43, 1839–1847 (2003).
as exemplified by our analyses of orthologous groups, Rf genes and 32. Li, Y. & Wei, K. Comparative functional genomics analysis of cytochrome P450 gene
superfamily in wheat and maize. BMC Plant Biol. 20, 93 (2020).
NLR immune receptors26. Finally, we highlight haplotype blocks, 33. Gunupuru, L. R. et al. A wheat cytochrome P450 enhances both resistance to
which will facilitate marker development for applied breeding 43,50. deoxynivalenol and grain yield. PLoS ONE 13, e0204992 (2018).
Equipped with multiple layers of data describing variation in wheat, 34. Li, B. et al. Wheat centromeric retrotransposons: the new ones take a major role in
centromeric structure. Plant J. 73, 952–965 (2013).
we now have powerful tools to increase the rate of wheat improve- 35. Gent, J. I., Wang, K., Jiang, J. & Dawe, R. K. Stable patterns of CENH3 occupancy through
ment to meet future food demands. maize lineages containing genetically similar centromeres. Genetics 200, 1105–1116 (2015).

36. Koo, D. H., Sehgal, S. K., Friebe, B. & Gill, B. S. Structure and stability of telocentric 1
Crop Development Centre, University of Saskatchewan, Saskatoon, Saskatchewan,
chromosomes in wheat. PLoS ONE 10, e0137747 (2015). Canada. 2Grain Research Laboratory, Canadian Grain Commission, Winnipeg, Manitoba,
37. Schneider, K. L., Xie, Z., Wolfgruber, T. K. & Presting, G. G. Inbreeding drives maize Canada. 3Department of Plant Pathology, Kansas State University, Manhattan, KS, USA.
centromere evolution. Proc. Natl Acad. Sci. USA 113, E987–E996 (2016). 4
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland,
38. Saxena, R. K., Edwards, D. & Varshney, R. K. Structural variations in plant genomes. Brief.
Germany. 5Helmholtz Zentrum München—German Research Center for Environmental
Funct. Genomics 13, 296–307 (2014).
Health, Neuherberg, Germany. 6Aquatic and Crop Resource Development, National
39. Harewood, L. et al. Hi-C as a tool for precise detection and characterisation of
chromosomal rearrangements and copy number variation in human tumours. Genome Research Council Canada, Saskatoon, Saskatchewan, Canada. 7John Innes Centre, Norwich
Biol. 18, 125 (2017). Research Park, Norwich, UK. 8Department of Plant and Microbial Biology, University of
40. Himmelbach, A. et al. Discovery of multi-megabase polymorphic inversions by Zurich, Zurich, Switzerland. 9Morden Research and Development Centre, Agriculture and
chromosome conformation capture sequencing in large-genome plant species. Plant J. Agri-Food Canada, Morden, Manitoba, Canada. 10Department of Computer Science,
96, 1309–1316 (2018). University of Saskatchewan, Saskatoon, Saskatchewan, Canada. 11Brandon Research and
41. Fradgley, N. et al. A large-scale pedigree resource of wheat reveals evidence for Development Centre, Agriculture and Agri-Food Canada, Brandon, Manitoba, Canada.
adaptation and selection by breeders. PLoS Biol. 17, e3000071 (2019). 12
Genomics/Transcriptomics group, Functional Genomics Center Zurich, Zurich,
42. Martín, A. C., Rey, M. D., Shaw, P. & Moore, G. Dual effect of the wheat Ph1 locus on
Switzerland. 13Department of Evolutionary Biology and Environmental Studies, University of
chromosome synapsis and crossover. Chromosoma 126, 669–680 (2017).
43. Bevan, M. W. et al. Genomic innovation for crop improvement. Nature 543, 346–354 Zurich, Zurich, Switzerland. 14Institute of Agricultural Sciences, ETHZ, Zurich, Switzerland.
(2017).
15
Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan. 16Life
44. Luján Basile, S. M. et al. Haplotype block analysis of an Argentinean hexaploid wheat Sciences Department, Natural History Museum, London, UK. 17Earlham Institute, Norwich
collection and GWAS for yield components and adaptation. BMC Plant Biol. 19, 553 (2019). Research Park, Norwich, UK. 18The John Bingham Laboratory, NIAB, Cambridge, UK.
45. Fox, S. L. et al. Unity hard red spring wheat. Can. J. Plant Sci. 90, 71–78 (2010). 19
Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN,
46. Hanks, S. K., Quinn, A. M. & Hunter, T. The protein kinase family: conserved features and USA. 20Global Institute for Food Security, University of Saskatchewan, Saskatoon,
deduced phylogeny of the catalytic domains. Science 241, 42–52 (1988).
Saskatchewan, Canada. 21School of Plant Sciences and Food Security, Tel Aviv University,
47. Brueggeman, R. et al. The stem rust resistance gene Rpg5 encodes a protein with
Ramat Aviv, Israel. 22Department of Entomology, University of Manitoba, Winnipeg,
nucleotide-binding-site, leucine-rich, and protein kinase domains. Proc. Natl Acad. Sci.
USA 105, 14970–14975 (2008). Manitoba, Canada. 23Institute of Crop Science, NARO, Tsukuba, Japan. 24Centre for
48. Faris, J. D. et al. A unique wheat disease resistance-like gene governs effector-triggered Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada. 25National Institute
susceptibility to necrotrophic pathogens. Proc. Natl Acad. Sci. USA 107, 13544–13549 of Advanced Industrial Science and Technology (AIST), Tokyo, Japan. 26Laboratory of Plant
(2010). Genetics, Graduate School of Agriculture, Kyoto University, Kyoto, Japan. 27Humanome Lab,
49. Luo, M. C. et al. Genome sequence of the progenitor of the wheat D genome Aegilops Tokyo, Japan. 28Global Wheat Program, International Maize and Wheat Improvement Center
tauschii. Nature 551, 498–502 (2017). (CIMMYT), Texcoco, Mexico. 29Montana BioAg, Missoula, MT, USA. 30Australian Research
50. Borrill, P., Harrington, S. A. & Uauy, C. Applying the latest advances in genomics and
Council Centre of Excellence in Plant Energy Biology, School of Molecular Sciences,
phenomics for trait discovery in polyploid wheat. Plant J. 97, 56–72 (2019).
University of Western Australia, Perth, Western Australia, Australia. 31Ottawa Research and
Development Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario, Canada.
published maps and institutional affiliations.
32
Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, Victoria, Australia.
33
Syngenta, Durham, NC, USA. 34School of Agriculture, Food and Wine, University of
Open Access This article is licensed under a Creative Commons Attribution Adelaide, Adelaide, South Australia, Australia. 35German Centre for Integrative Biodiversity
4.0 International License, which permits use, sharing, adaptation, distribution Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany. 36Biological and Environmental
and reproduction in any medium or format, as long as you give appropriate Science & Engineering Division, King Abdullah University of Science and Technology,
credit to the original author(s) and the source, provide a link to the Creative Commons license, Thuwal, Saudi Arabia. 37Graduate School of Life and Environmental Sciences, Kyoto
and indicate if changes were made. The images or other third party material in this article are Prefectural University, Kyoto, Japan. 38Institute of Evolution and Department of Evolutionary
included in the article’s Creative Commons license, unless indicated otherwise in a credit line
and Environmental Biology, University of Haifa, Haifa, Israel. 39School of Life Sciences
to the material. If material is not included in the article’s Creative Commons license and your
Weihenstephan, Technical University of Munich, Freising, Germany. 40Center for Integrated
intended use is not permitted by statutory regulation or exceeds the permitted use, you will
need to obtain permission directly from the copyright holder. To view a copy of this license, Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
visit http://creativecommons.org/licenses/by/4.0/.
41
These authors contributed equally: Sean Walkowiak, Liangliang Gao, Cecile Monat.
✉e-mail: curt.mccartney@canada.ca; manuel.spannagl@helmholtz-muenchen.de; wicker@
© The Author(s) 2020 botinst.uzh.ch; curtis.pozniak@usask.ca

Article
Methods Identification of orthologous groups was analogous to the approach
used previously55. Reciprocal best BLAST hit (RBH) graphs were derived
No statistical methods were used to predetermine sample size. The from pairwise all-against-all BLASTn v2.8 transcript searches (minimal
field experiments were randomized, but the wheat lines sequenced e-value ≤ 1 × 10−30). Hits were assigned to homeologous groups on the
and assembled were not selected at random. The investigators were basis of gene models of Chinese Spring following a previously described
not blinded to allocation during experiments and outcome assessment. homeologue classification9. Multiple sequence alignments for the
population genetics analysis were performed using MUSCLE v.3.8 with
Assemblies and annotation default parameters (Supplementary Note 3). Using the gene projec-
Genome assemblies. We assembled the genomes of 15 diverse wheat tions, we quantified average pairwise genetic diversity (π), polymor-
lines using two approaches (Supplementary Table 1). The RQA approach phism (Watterson’s θW), and Tajima’s D using compute and polydNdS
used the DeNovoMAGIC v.3.0 assembly pipeline, previously used for in the libsequence v.1.0.3-1 package56. We retained diversity estimates
the wild emmer wheat11, durum wheat12 and Chinese Spring RefSeqv1.0 for genes that were in all of the genomes and had ≤100 segregating
assemblies. In brief, high-molecular-weight DNA was extracted from sites. PAV was determined from the orthologous groups limited to
wheat seedlings as described previously51. Illumina 450-bp paired-end one-to-one relations where there was no match in at least one genome.
(PE), 800-bp PE and mate-pair (MP) libraries of three different sizes (3
kb, 6 kb and 9 kb) were generated. Sequencing was performed at the Analysis of the Rf-like gene family. For Rf genes, the genome se-
University of Illinois Roy J. Carver Biotechnology Center. 10X Genom- quences were scanned for ORFs in six frame translations with the getorf
ics Chromium libraries were prepared and sequenced at the Genome program of the EMBOSS v.6.6.0 package. ORFs longer than 89 codons
Canada Genome Innovation Centre using the manufacturers’ recom- were searched for the presence of PPR motifs using hmmsearch from
mendations to achieve a minimum of 30 × coverage. Hi-C libraries were the HMMER v.3.2.1 package (http://hmmer.org) and the hidden Markov
prepared using previously described methods40. Using the Illumina PE, models defined previously. The PF02536 profile from the Pfam v32.0
MP, 10X Genomics Chromium, and Hi-C, chromosome scale assemblies database (http://pfam.xfam.org) was used to screen for ORFs carrying
were prepared as described previously18. For cultivars assembled to a mTERF motifs. Downstream processing of the hmmsearch results fol-
scaffold level, we used the W2RAP-contigger using k = 200 (Supple- lowed the pipeline described previously57. ORFs with low hmmsearch
mentary Note 1). Two MP libraries (10 kb and 13 kb) were produced for scores were removed from the analysis as they are unlikely to repre-
each line except Weebill 1, for which two additional MP libraries were sent functional PPR proteins. Only genes encoding mTERF proteins
used. Mate pairs were processed, filtered and used to scaffold contigs longer than 100 amino acids were included in the analysis. RFL-PPR
as described in the W2RAP pipeline (https://github.com/bioinfolog- sequences were identified as described23. The phylogenetic analyses
ics/w2rap). Scaffolds of less than 500 bp were removed from the final were performed as described previously23. Conserved, non-PPR genes
assemblies. Additionally, we performed Oxford Nanopore sequenc- delimiting the borders of analysed RFL clusters were identified in the
ing of CDC Landmark using R9 flow cells and the GridION sequencing Chinese Spring RefSeqv1.0 reference genome and used to search for
technology (Supplementary Note 2). syntenic regions in the remaining wheat accessions with BLAST v.2.8.
See Supplementary Note 4 for more details.
Nucleotide diversity analysis. The variant call format data files from
two wheat exome-capture studies4,5 were retrieved, combined, and NLR repertoire. NLR signatures were annotated using
filtered to retain hexaploid accessions and polymorphisms detected NLR-Annotator58,59 (https://github.com/steuernb/NLR-Annotator)
in both studies. The 10X Genomics Chromium sequencing data for with the option -a. We estimated redundancy of NLR signatures between
each of the RQA lines were aligned to Chinese Spring RefSeqv1.0 using genomes at different thresholds of identity: 95%, 98% and 100%. For the
the LongRanger v.2.1.6 software. Alignment files from the accessions 165 amino acids in the consensus of all NB-ARC motifs, this translates to
assembled here and 16 Bioplatforms Australia lines19 with alignments 8, 3 and 0 mismatches of a concatenated motif sequence. To calculate
obtained from the DAWN project52 were then used for variant calling by the overall redundancy in all genomes, we counted the number of LR
GATK v.3.8 at the same genomic positions identified by exome-capture signatures added to a non-redundant set by adding genomes iteratively.
sequencing. The variant files from the exome-capture studies, DAWN This was done for 1 million random permutations.
project and 10+ Wheat Genomes lines were then merged and subjected
to principal component analysis (PCA) using the prcomp function in Repeat annotation. Transposons were detected and classified by a
R v.3.6.1. homology search against the REdat_9.7_Poaceae section of the PGSB
transposon library60 using vmatch (http://www.vmatch.de) with the fol-
Gene projections. We used the previously published high-confidence lowing parameters: identity ≥70%, minimal hit length 75 bp, seedlength
gene models for Chinese Spring to assess the gene content in each 12 bp (exact command line: -d -p -l 75 -identity 70 -seedlength 12 -exdrop
assembly. Representative coding sequences of each informant locus 5). To remove overlapping annotations, the output was filtered for
were aligned to pseudomolecules of each line separately using BLAT53 redundant hits via a priority-based approach in which higher-scoring
v.3.5 with the ‘fine’ parameter and a maximal intron size of 70 kb. BLAT matches where assigned first and lower-scoring hits at overlapping
matches seeded an additional alignment by exonerate54 in the genomic positions were either shortened or removed if there was ≥90% overlap
neighbourhood encompassing 20 kb upstream and downstream of the with a priority hit or if <50 bp remained. Tandem repeats where iden-
match position. Exonerate alignments required a minimal and maximal tified with TandemRepeatFinder v.4.09 under default parameters61
intron sizes of 30 bp and 20 kb, respectively. A linear regression of and subjected to overlap removal as described above. Full-length LTR
colocalized matches with complete alignments of the informant were retrotransposons were identified with LTRharvest (http://genometools.
computed for 10,000 such pairs to derive a normalization function org/documents/ltrharvest.pdf). All candidates were subsequently an-
and to render comparable scoring schemes for both methods. Sub- notated for PfamA domains using HMMER v.3.0 and filtered to remove
sequently, we selected the top-scoring match for each mapping pair false positives, non-canonical hybrids and gene-containing elements.
as the locus for the gene projection. Projections were then filtered by The inner domain order served as a criterion for the LTR retrotranspo-
alignment coverage (Supplementary Note 3), the open reading frame son superfamily classification, either Gypsy (RLG: RT-RH-INT), Copia
(ORF) contiguity, the observed mapping frequency of the informant, (RLC: INT-RT-RH) or undetermined (RLX). The insertion age of fl-LTRs
coverage of start and stop codons, and the orthology or potential dis- was calculated from the divergence between the 5′ and 3′ long terminal
location of the match scaffold relative to its informant chromosome. repeats, which are identical upon insertion. The genetic distance was
calculated with EMBOSS v.6.6.0 distmat (Kimura2-parameter correc- (Supplementary Table 12) following a standard CTAB–chloroform
tion) using a random mutation rate of 1.3 × 10−8. extraction method. Yield and integrity were evaluated by fluorom-
etry (Qubit 2.0) and agarose gel electrophoresis. Paired-end libraries
Analysis of centromeric regions. For each line with a RQA, ChIP were prepared following the Nextera DNA Flex protocol. In brief, 500
was performed according to previous methods62 with slight modi- ng gDNA from each accession was fragmented and amplified with a
fication using a wheat-specific CENH3 antibody36. An antigen with limited-cycle PCR. Each library was uniquely dual-indexed with a dis-
the peptide sequence RTKHPAVRKTKALPKK, corresponding to the tinct 10-bp index code (IDT for Illumina Nextera DNA UD) for multiplex-
N terminus of wheat CENH3, was used to produce an antibody using ing, and quantified by qPCR (Kapa Biosystems). Final average library
the custom-antibody production facility provided by Thermo Fisher size was estimated on a Tapestation 2200. Libraries were normalized
Scientific. The customized antibody was purified and obtained as pel- and pooled for sequencing on an Illumina NovaSeq 6000 S4 to generate
lets. The antibody pellet (0.396 mg) was dissolved in 2 ml PBS buffer, ~5× coverage per genotype. Sequencing data were de-multiplexed and
pH 7.4, resulting in a working concentration of 198 ng μl−1. Nuclei were aligned to appropriate RQAs (Supplementary Table 12) in semi-perfect
isolated from 2-week-old seedlings, digested with micrococcal nuclease mode using the BBMap v.38 short-read alignment software (https://
and incubated overnight at 4 °C with 3 μg of antibody or rabbit serum sourceforge.net/projects/bbmap/).
(control). Antibodies were captured using Dynabeads Protein G and
the chromatin eluted using 100 μl of 1% sodium dodecyl sulfate, 0.1 Structural variation
M NaHCO3 preheated to 65 °C. DNA isolation was then performed us- We karyotyped the lines using mitotic metaphase chromosomes
ing ChIP DNA Clean & Concentrator Kit, and ChIP–seq libraries were prepared by the conventional acetocarmine-squash method.
constructed using TruSeq ChIP Library Preparation Kit and sequenced Non-denaturing fluorescence in situ hybridization (ND-FISH) of
with a NovoSeq S4, which generated 150-bp paired-end reads. three repetitive sequence probes, Oligo-pSc119.2-1, Oligo-pTa535
For Chinese Spring, we used two datasets, SRR168679963 (dataset 1) and Oligo-pTa713, was performed as described66,67 (Supplementary
and the dataset generated in this study (dataset 2). Sequence reads were Note 6). Chromosomes were counterstained with DAPI. Chromo-
de-multiplexed, trimmed and aligned to each of the respective RQAs some images were captured with an Olympus BX61 epifluorescence
using HISAT2 v.2.1.064. Alignments were sorted, filtered for minimum microscope and a CCD camera DP80. Images were processed and
alignment quality of 30, counted in 100-kb bins using samtools v.1.10 pseudocoloured with ImageJ v.1.51n in the Fiji package. For karyo-
and BEDtools v.2.29, and visualized in R v.3.6.1. To define the midpoint of typing, at least four chromosomes per accession were examined
each centromere, we identified the highest density of CENH3 ChIP–seq and compared to the karyotype of Chinese Spring as described pre-
reads using a smoothing spline in R v.3.6.1 with smooth.spline function viously68. Hierarchal clustering of karyotype polymorphisms was
(number of knots = 1,000) and identified the peak of the smooth spline performed using the Ward method in R v.3.0.2, which was used to
as the centre of the respective centromere for a given chromosome. estimate distance. Next, we applied Hi-C analysis for inversion calling
To compare centromeric positions of different genomes, the CENH3 as described previously40. In brief, adapters were removed and reads
ChIP–seq density was plotted along with MUMmer v.4.0 chromosome were mapped to Chinese Spring using minimap2 v.2.1069 as we have
alignments. To determine the overall size of wheat centromeres, we done previously21. The raw Hi-C link counts were calculated in 1 Mb
considered each 100-kb bin with CENH3 ChIP–seq read density that non-overlapping sliding windows and then normalized as described
was greater than three times the background (genome average) level of in our previous work40. Finally, the normalized Hi-C link matrix was
read density to be an active centromeric bin. The number of enriched subjected to inversion calling using R.
bins for each genome were counted and averaged to a total of 21 chro- We performed flow cytometry of wheat cultivars Arina and Forno
mosomes. This calculation included counting of unanchored bins. as previously described70, except that we used a FACSAria SORP flow
cytometer and cell sorter (Becton Dickinson). The 5B/7B translocation
Analysis of introgressions breakpoints were identified by comparison of chromosomes 5B and 7B
Identification of full-length RLC-Angela retrotransposons. Retro- from ArinaLrFor and Julius. Sequence collinearity between ArinaLrFor
transposon profiles were created for each genome using the RLC-Angela and Julius was detected by BLASTn searches of 1,000-bp sequence
family65 and consensus sequences obtained from the TREP database windows every 100 kb along the chromosomes. Once an interruption
(www.botinst.uzh.ch/en/research/genetics/thomasWicker/trep-db. of synteny was detected, sequence segments at the positions of syn-
html). First, BLASTn was used to compare the ~1,700-bp LTR of teny loss were extracted and used for local alignments to determine
RLC-Angela to each genome. Matching elements and 500 bp of flank- the precise breakpoint positions. PCR amplification of the 5BS/7BS
ing sequences were aligned to identify precise LTR borders as well as and 7BL/5BL translocation sites was performed using standard PCR
different sub-families and/or sequences variants. We then used BLASTn cycling conditions.
to compare the 18 consensus LTR sequences against each genome and
then screened for pairs of full-length LTRs that are found in the same Characterization of haplotypes
orientation within a window of 7.5–9.5 kb (RLC-Angela elements are ~8.7 Development of a wheat genome haplotype database. To iden-
kb long). These initial candidate full-length elements were screened for tify haplotypes, pairwise chromosome alignments were performed
the presence of RLC-Angela polyprotein sequences by BLASTx, as well between the RQA using MUMmer v.4.0, which were combined
as for the typical 5-bp target-site duplications. We allowed a maximum with pairwise nucleotide BLASTn analyses of the genes ± 2,000 bp
of two mismatches between the two target-site duplications. All identi- using custom scripts in R v.3.6.1 (https://github.com/Uauy-Lab/
fied full-length RLC-Angela copies were then aligned to a RLC-Angela pangenome-haplotypes)71 (Supplementary Note 8). The resultant
consensus sequence with the program Water from the EMBOSS v.6.6.0 haplotypes were uploaded to an interactive viewer (http://www.
package (www.ebi.ac.uk/Tools/emboss/). These alignments were used crop-haplotypes.com/). Pairwise BLASTn comparisons of the genes
to compile all nucleotide polymorphisms into a single file. The variant were also used to identify structural variants, and were uploaded into
call file was then used for PCA using the snpgdsPCA function in the R AccuSyn (https://accusyn.usask.ca/) and SynVisio (https://synvisio.
package SNPrelate v.3.11. github.io/#/) to create a wheat-specific database (https://kiranbandi.
github.io/10wheatgenomes/). Pretzel (https://github.com/plantinfor-
Sequencing of the tertiary gene pool of wheat. Genomic DNA matics/pretzel) was also used to visualize and compare the RQA and
(gDNA) was extracted and purified from young leaf tissue collected the projected gene annotations (http://10wheatgenomes.plantinfor-
from multiple accessions of T. timopheevii, A. ventricosa and T. ponticum matics.io/).
Article
are available for direct user download at https://wheat.ipk-gatersleben.
Characterization of Sm1. Sm1-linked markers6 were located in RQAs de/. All assemblies and projected annotations are available for com-
using BLAST v.2.8.0. Two high-resolution mapping populations were parative analysis at Ensembl Plants (https://plants.ensembl.org/index.
developed, 99B60-EJ2D/Thatcher and 99B60-EJ2G/Infinity. Progeny html). Comparative analysis viewers are also online for synteny (https://
heterozygous for crossover events near Sm1 were identified in the F2 kiranbandi.github.io/10wheatgenomes/, http://10wheatgenomes.
generation, and the crossovers were fixed in the F3 generation. The plantinformatics.io/) and haplotypes (http://www.crop-haplotypes.
resulting F2-derived F3 families were analysed with KASP markers within com/). Seed stocks of the assembled lines are available at the UK Germ-
the Sm1 region and tested for resistance to OWBM in field nurseries plasm Resources Unit (https://www.seedstor.ac.uk/).
to identify markers associated with Sm1. Ethyl methanesulfonate was
used to develop knockout mutants in the Sm1 gene. Approximately
3,200 seeds of the Canadian spring wheat variety Unity (an Sm1 carrier) Code availability
were soaked in a 0.2% (v/v) aqueous ethyl methanesulfonate solution Code for custom genome visualizers have been deposited in the
for 22 h at 22 °C. The seed was then rinsed in distilled water and sown public domain for haplotype viewer (https://github.com/Uauy-Lab/
in a field nursery. The M1 seed was grown to maturity and bulk har- pangenome-haplotypes), Pretzel (https://github.com/plantinformat-
vested. Approximately 6,000 M2 seeds were space planted in two field ics/pretzel), AccuSyn (https://github.com/jorgenunezsiri/accusyn) and
nurseries located in Brandon and Glenlea, Manitoba, Canada. Spikes SynVisio (https://github.com/kiranbandi/synvisio). Additional scripts
were collected on a per-plant basis at maturity and were classified as used for ChIP–seq analysis of the centromeres are provided at https://
resistant, susceptible or undamaged as done previously6,72. Putative github.com/wheatgenetics/centromere.
Sm1-knockout mutants were re-tested for OWBM resistance in indoor
cage tests73 in the M3 and M4 generations. M4-derived families were 51. Dvorak, J., Mcguire, P. E. & Cassidy, B. Apparent sources of the A genomes of wheats
inferred from polymorphism in abundance and restriction fragment length of repeated
tested for resistance to OWBM in field nurseries (randomized complete nucleotide-sequences. Genome 30, 680–689 (1988).
block design, six environments, and eight replicates per environment). 52. Watson-Haigh, N. S., Suchecki, R., Kalashyan, E., Garcia, M. & Baumann, U. DAWN: a
Candidate genes were identified between Sm1 flanking markers on resource for yielding insights into the diversity among wheat genomes. BMC Genomics
19, 941 (2018).
the CDC Landmark assembly using the projected gene annotations and 53. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
FGENESH v.2.6 (http://www.softberry.com/), which were compared to 54. Slater, G. S. & Birney, E. Automated generation of heuristics for biological sequence
comparison. BMC Bioinformatics 6, 31 (2005).
the projected genes of non-carriers. Both 5′ and 3′ rapid amplification
55. Tatusov, R. L., Galperin, M. Y., Natale, D. A. & Koonin, E. V. The COG database: a tool for
of cDNA ends (5′ and 3′ RACE) were used to verify the transcription genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 28, 33–36
initiation and termination sites of the gene candidate, whose structure (2000).
56. Thornton, K. Libsequence: a C++ class library for evolutionary genetic analysis.
was predicted by FGENESH v.2.6. In brief, RNA was extracted from the
Bioinformatics 19, 2325–2327 (2003).
leaves of Unity (Sm1 carrier) seedlings (using the Qiagen RNeasy kit), 57. Cheng, S. et al. Redefining the structural motifs that determine RNA binding and RNA
RACE PCR performed (Invitrogen GeneRacer kit), and the PCR product editing by pentatricopeptide repeat proteins in land plants. Plant J. 85, 532–547 (2016).
58. Steuernagel, B. et al. Physical and transcriptional organisation of the bread wheat
cloned (Invitrogen TOPO TA Cloning kit for sequencing) and sequenced
intracellular immune receptor repertoire. Preprint at https://doi.org/10.1101/339424 (2018).
by Sanger sequencing. Prediction of the conserved domains was done 59. Steuernagel, B. et al. The NLR-Annotator tool enables annotation of the intracellular
using the NCBI Conserved Domain Search tool (https://www.ncbi. immune receptor repertoire. Plant Physiol. 183, 468–482 (2020).
60. Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative
nlm.nih.gov/Structure/cdd/wrpsb.cgi) and PROSITE (release 2020_01;
plant genome research. Nucleic Acids Res. 44 (D1), D1141–D1147 (2016).
https://prosite.expasy.org/). The LRR domain was defined on the basis 61. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids
of the presence of 2–42 LRR motif repeats of 20–30 amino acids each. Res. 27, 573–580 (1999).
62. Nagaki, K. et al. Chromatin immunoprecipitation reveals that the 180-bp satellite repeat
LRR motifs were manually annotated74. Prediction of transmembrane
is the key functional DNA element of Arabidopsis thaliana centromeres. Genetics 163,
regions and orientation was performed using the program TMpred 1221–1225 (2003).
NCBI Conserved Domain Search tool (https://embnet.vital-it.ch/soft- 63. Guo, X. et al. De novo centromere formation and centromeric sequence expansion in
wheat and its wide hybrids. PLoS Genet. 12, e1005997 (2016).
ware/TMPRED_form.html). 64. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome
To study the expression of Sm1, total RNA was extracted from four alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–
biological replicates from four wheat genotypes (Unity, CDC Landmark, 915 (2019).
65. Wicker, T. et al. Impact of transposable elements on genome structure and evolution in
Waskada and Thatcher) from two different tissues; seedling leaves and bread wheat. Genome Biol. 19, 103 (2018).
developing kernels (five days post anthesis) using NucleoSpin RNA Plant 66. Tang, Z., Yang, Z. & Fu, S. Oligonucleotides replacing the roles of repetitive sequences
kit (Macherey-Nagel) according to the manufacturer’s instructions. pAs1, pSc119.2, pTa-535, pTa71, CCS1, and pAWRC.1 for FISH analysis. J. Appl. Genet. 55,
313–318 (2014).
RNA was treated with RNase-free DNase (rDNase) (Macherey-Nagel) 67. Zhao, L. et al. Cytological identification of an Aegilops variabilis chromosome carrying
and reversed transcribed into cDNA using SuperScript IV Reverse Tran- stripe rust resistance in wheat. Breed. Sci. 66, 522–529 (2016).
scriptase kit (Invitrogen) according to the manufacturer’s instructions 68. Komuro, S., Endo, R., Shikata, K. & Kato, A. Genomic and chromosomal distribution
patterns of various repeated DNA sequences in wheat revealed by a fluorescence in situ
and the NB-ARC domain amplified by PCR. hybridization procedure. Genome 56, 131–137 (2013).
69. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–
Reporting summary 3100 (2018).
70. Kubaláková, M., Vrána, J., Cíhalíková, J., Simková, H. & Doležel, J. Flow karyotyping and
Further information on research design is available in the Nature chromosome sorting in bread wheat (Triticum aestivum L.). Theor. Appl. Genet. 104,
Research Reporting Summary linked to this paper. 1362–1372 (2002).
71. Brinton, J. et al. A haplotype-led approach to increase the precision of wheat breeding.
Commun. Biol. https://doi.org/10.1038/s42003-020-01413-2 (2020).
72. Thomas, J. et al. Chromosome location and markers of Sm1: a gene of wheat that
Data availability conditions antibiotic resistance to orange wheat blossom midge. Mol. Breed. 15, 183–192
All sequence reads assemblies have been deposited into the National (2005).
73. Lamb, R. J. et al. Resistance to Sitodiplosis mosellana (Diptera: Cecidomyiidae) in spring
Center for Biotechnology Information sequence read archive (SRA) wheat (Gramineae). Can. Entomol. 132, 591–605 (2000).
(see Supplementary Table 1 for accession numbers). Sequence reads 74. la Cour, T. et al. Analysis and prediction of leucine-rich nuclear export signals. Protein
for the RQAs, T. ponticum, A. ventricosa and T. timopheevii have been Eng. Des. Sel. 17, 527–536 (2004).
deposited into the SRA (accession no. PRJNA544491) and ChIP–seq

Acknowledgements We are grateful for funding from the Canadian Triticum Applied Genomics
short read-data used for centromere characterization is deposited
research project (CTAG2) funded by Genome Canada, Genome Prairie, the Western Grains
under accession no. PRJNA625537. All Hi-C data have been deposited in Research Foundation, Government of Saskatchewan, Saskatchewan Wheat Development
the European Nucleotide Archive (Supplementary Table 1). The RQAs Commission, Alberta Wheat Commission, Viterra and Manitoba Wheat and Barley Growers
Association. Funding was also provided by the Biotechnology and Biological Sciences Research Genome assemblies were contributed as follows: CDC Stanley and CDC Landmark: P.J.H.,
Council (BBSRC) via the projects Designing Future Wheat (BB/P016855/1), sLOLA (BB/ C.J.P., A.G.S., B.B., C.S.K., A.N., K.N. and S.W.; Julius: K.F.X.M., N.S., M.M., C.M. and U.S.;
J003557/1) and MAGIC Pangenome (BB/P010741/1, BB/P010733/1 and BB/P010768/1), by AMED Jagger: G.M., J.P. and L.G.; ArinaLrFor: B.K., S.G.K. and M.C.K.; Mace and LongReach Lancer:
NBRP (JP17km0210142), the German Federal Ministry of Education and Research (FKZ K.C., P.L., G.K.-G. and J.T.; Norin 61: K.K.S., H.H., S.N., J.S., K. Kawaura, H.T., T. Tameshige, T.B.,
031B0190, WHEATSeq, 2819103915 and 2819104015), German Network for Bioinformatics and D.C., M.H., R.S.-I., C.A., F.K., J.G.-G. and N.S.; SY Mattis: E.L. and A.B.; spelt (PI190962): A.D.,
Infrastructure de.NBI (FKZ 031A536A, 031A536B), German Federal Ministry of Food and C.J.P. and J.D.; Robigus, Claire, Paragon and Cadenza: M.B., M.C., B.C., C.F., N.F. and D.H.;
Agriculture (BMEL FKZ 2819103915 WHEATSEQ), Israel Science Foundation (Grant 1137/17), JST Weebill 1: M.C., B.C., J.C., K.A.G., L.P.-A. and L.V. Sequencing, assembly and analysis were
CREST (JPMJCR16O3), US National Science Foundation (1339389), Kansas Wheat Commission contributed by WRA2P computational assembly: A. Hall, B.C., G.G.A., K. Krasileva, N.M.,
and Kansas State University, MEXT KAKENHI, The Birth of New Plant Species (JP16H06469, D.S. and J. Wright; 10X Genomics: H.B., C.J.P., J.E., S.K. and K.W.; Hi-C and structural
JP16H06464, JP16H06466 and JP16K21727), National Agriculture and Food Research analysis: M.M., N.S., A. Himmelbach, C.M., S.P. and L.G.; pseudomolecule assemblies: M.M.,
Organization (NARO) Vice President Fund, Swiss Federal Office of Agriculture (NAP-PGREL), C.M. and N.S.; gene projections and TE analysis: K.F.X.M., M.S., H.G. and G.H.; diversity and
Agroscope, Delley Seeds and Plants, ETH Zurich Institute of Agricultural Sciences, Fenaco polymorphism analysis: K.K.S., E.D., T.P., G.H.-N., D.C., M.H., G.H., H.H., H.K., M.S., K.M., T.
Co-operative, IP-SUISSE, swisssem, JOWA, SGPV-FSPC, Swiss National Science Foundation Tameshige, T. Tanaka, J.S. and J. Wu; centromere diversity: J.P. and D.H.K.; 5B/7B
(31003A_182318 and CRSII5_183578), University of Zurich Research Priority Program Evolution in translocation: S.G.K., T.W., J.C. and M.C.K; 2NvS introgression: J.P., A.K.F., L.G., P.J., C.J.P., R.S.
Action, King Abdullah University of Science and Technology, Grains Research and Development and S.W.; TE-based introgressions: T.W., B.B., J.E., M.C.K., J.P., C.J.P., J.T. and S.W; cytological
Corporation (GRDC), Australian Research Council (CE140100008) and Groupe Limagrain. We karyotyping: S.N., K.M., Y.N., J.S. and T.K.; diversification of Rf genes: J.M. and I.S.; NLR
are grateful for the computational support of the Functional Genomics Center Zurich, the repertoire: S.G.K. and B.S.; Sm1 gene cloning: C.A.M., C.J.P., C.U., J.B., A.C.C., S.C., P.F.,
Molecular Plant Breeding Group—ETH Zurich, and the Global Institute of Food Security (GIFS), M.T.K., V.K., D.T. and K.W.; haplotype database: C.U., J.B. and R.H.R.-G.; visualization software:
Saskatoon. We acknowledge the contribution of the Australian Wheat Pathogens Consortium C.G., V.B., G.K.-G., J.N.S., J.T. and J.M.; BLAST server: M.M., A.F. and U.S.; C.J.P and S.W.
(https://data.bioplatforms.com/organization/edit/bpa-wheat-cultivars) in the generation of data drafted the manuscript with input from all authors. All co-authors contributed to and edited
used in this publication. The Initiative is supported by funding from Bioplatforms Australia the final version.
through the Australian Government National Collaborative Research Infrastructure Strategy
(NCRIS). We thank S. Wu for DNA preparations for assembly and ChIP–seq library preparations; Competing interests The authors declare no competing interests.
O. Francisco-Pabalan and J. Santos, T. Wisk and S. Wolfe for their provision of OWBM images;
M. Knauft, I. Walde, S. König, T. Münch, J. Bauernfeind and D. Schüler for their contribution to Additional information
Hi-C data generation and sequencing, DNA sequencing and IT administration and sequence Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
data management; J. Vrána for karyotyping the wheat cultivars Arina and Forno; and R. Regier 2961-x.
for project management, administration and support. Correspondence and requests for materials should be addressed to C.A.M., M.S., T.W. and C.J.P.
Peer review information Nature thanks Victor Albert, Rudi Appels and the other, anonymous,
Author contributions Project establishment: K.C., A.D., A. Hall, B.K., S.G.K., E.L., P.L., reviewer(s) for their contribution to the peer review of this work.
K.F.X.M., J.P., C.J.P., K.K.S., M.S. and N.S. Project coordination: A. Hall, C.J.P. and N.S. Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Chromosome-scale collinearity between the RQA. Lancer (red rectangles) and 5B/7B translocation in SY Mattis and ArinaLrFor
Genomes were aligned chromosome by chromosome using MUMmer and are (purple rectangles) are indicated.
represented as dot plots. The introgression on chromosome 2B of LongReach
Extended Data Fig. 2 | Evaluation of the CDC Landmark RQA using Oxford chromosomes 2A, 3A, and 3D. The directionality biases estimated from
Nanopore Long Reads. a, Scaffold-scaffold long read contact map showing alignments of Hi-C data against Chinese Spring (left, top), and chromosome
shared read IDs between scaffold ends along the ordered scaffolds in the CDC alignment of the inversion events between CDC Landmark and Chinese Spring
Landmark pseudomolecules. The diagonal pattern indicates that adjacent RQAs (left, bottom) are shown. Long reads spanning the inversion events and
scaffolds share the same long reads and are therefore properly ordered and magnified views of the reads aligning to the left and right boundaries of the
oriented by Hi-C in the RQA. b, Characterization of inversion events on inversions (right) are provided.
Article
Extended Data Fig. 3 | Diversity of genes and TEs. a, Average pairwise genetic whiskers, 1.5 × interquartile range. c, Total gene counts and orthologues for the
diversity of the homeologues (coding sequences only) of the A, B and D RQA. Genes in orthologous groups with exactly one gene for each line
subgenomes. The mode of the A, B and D subgenome is 0.00057, 0.00082, and (Complete; dark brown), genes contained in unambiguous orthologous groups
0.0002, respectively. b, Tajima’s D estimates of coding sequences for each missing an orthologue for at least one line, that is, PAV (2-10 Lines; light brown),
wheat subgenome. The lower and upper range of the boxplot hinges and genes with ambiguous orthologues or CNV (Other; pink) are indicated. d,
correspond to the first and third quartiles (the 25th and 75th percentiles). Per cent of pairwise shared syntenic fl-LTRs between wheat lines.
Boxplots show centre line, median; box limits, upper and lower quartiles;
Extended Data Fig. 4 | Evolutionary relationships among PPR and mTERF chromosome 1B. RFL genes are shown as light pink triangles above the
gene sequences. a, The RFL clade is in blue and all remaining P-class PPRs are in chromosome scale. Conserved non-PPR genes used as syntenic anchors are
green. b, Clustered mTERF sequences are in blue and the remaining mTERFs are shown on the chromosome scale as coloured triangles. The total number (T)
shown in green. The scale bar represents number of substitutions per site. c, and the number of putatively functional RFL genes with 10 or more PPR motifs
Sequence inversions and copy number variation at the Rf3 locus on (F) are indicated on the right side of each panel.
Article
Extended Data Fig. 5 | Identification of alien introgressions from wheat cultivars a foreign segment is found in. Regions of particular interest are
relatives. A feature of foreign chromosomal introgressions is that they contain indicated by black rectangles. These include the 2NvS alien introgression from
unique patterns of TE insertions. Shown are stretches of >20 Mb containing A. ventricosa at the end of chromosome 2A in Jagger, Mace, SY Mattis and CDC
multiple polymorphic RLC-Angela retrotransposons that are found only in one Stanley, as well as introgression in the central region of chromosome 2B from
or a few (≤4) of the sequenced lines. One representative chromosome for each T. timopheevi in LongReach Lancer, and introgression at the end of
wheat subgenome is shown. Individual polymorphic retrotransposons are chromosome 3D from T. ponticum in LongReach Lancer.
indicated as coloured vertical lines. Colours correspond to the number of
Extended Data Fig. 6 | Detailed characterization of the 2NvS introgression of 2NvS introgression carriers in North American datasets from CIMMYT,
from A. ventricosa. a, Pairwise alignments of the first 50 Mb of chromosome Kansas State, and the USDA Winter Wheat Regional Performance Nursery
2A. The black arrow indicates a possible unique haplotype within spelt. b, (RPN) over time. d, Per cent yield difference in lines that carry the 2NvS
Orthologous genes between the 2NvS introgression from A. ventricosa in Jagger introgression. Two sided t-tests were performed to test for the significance of
and the genes on chromosomes 2A, 2B, and 2D in Chinese Spring. c, Frequency the impact of the 2NvS introgression. **P < 0.01; ***P < 0.001.
Article
Extended Data Fig. 7 | Centromere positions and karyotype variation. Chinese Spring (blue) and a representative genome of comparison (red) for
Functional centromere positions in the RQA have undergone structural and chromosome 4B of CDC Stanley (a), and chromosome 5B of Julius (b). c,
positional rearrangement. Chromosome alignments showing collinearity Detailed list and clustering of cytological features carried by each wheat line
(black scaffolds in same orientation, grey scaffolds in opposite orientation) (Supplementary Note 6). Features that are identical (dark grey) or have a gain
with relative density of CENH3 ChIP–seq mapped to 100 kb genomic bins for (black) or loss (light grey) relative to Chinese Spring are indicated.
Extended Data Fig. 8 | Hi-C validates inversions identified from pairwise Spring are shown. Boundaries of diagonal segments are indicative of inversions
chromosome alignments. Pairwise alignments of chromosome 6B from the and coincide with inversion boundaries identified from the chromosome
RQA and Chinese Spring are shown. Above each alignment dot plot, the alignments.
directionality biases estimated from alignments of Hi-C data against Chinese
Article
Extended Data Fig. 9 | Characterization of a translocation involving wheat nested PCR yielded a ~5 kb fragment that spanned the translocation breakpoint
chromosomes 5B and 7B. a, Cytogenetic karyotypes of Forno (left) and Arina and its identity was confirmed by sequencing. Both PCR and nested PCR were
(right), the parents of ArinaLrFor. Note that the large recombinant performed in duplicate; both replicates of the nested PCR were sequenced
chromosome 7B is represented by a distinct peak. b, Sequence of the using the Sanger method. For gel source data, see Supplementary Fig. 1. d,
translocation breakpoint on chromosome 7B of ArinaLrFor. Note that the exact Mapping of Illumina reads from the cultivars Arina and Forno on to the
breakpoint lies in a sequence gap (stretch of Ns). The bp positions are indicated pseudomolecules of ArinaLrFor. Sequence derived from Forno is shown in blue,
at the left. Forward PCR primers are shown in red and reverse primers in blue. while sequenced derived from Arina is in red. Note that chromosomes 5B and
The overlap of the two reverse primers is shown in purple. The outer primer 7B are derived from both parents, indicating that these parental chromosomes
pair was used for PCR, while the inner pair was used for a nested PCR. c, PCR can recombine freely, despite the presence of a large 5B/7B translocation in
amplification of the fragment spanning the translocation breakpoint. The Arina.
Article
Extended Data Fig. 10 | Confirmation of gene expression and gene domain of the Sm1 gene candidate (top) and actin control (bottom) derived
structure for Sm1. a, Critical recombinants from the 99B60-EJ2G/Infinity and from RNA isolated from developing kernels (left) and wheat seedlings (right).
99B60-EJ2D/Thatcher populations used to fine map Sm1. The 99B60-EJ2G/ Unity and CDC Landmark are carriers of Sm1. Waskada carries an alternative
Infinity cross had 5,170 F2 plants, while 99B60-EJ2D/Thatcher cross had 5,264 haplotype and does not carry Sm1 (see main text). Thatcher was used as a
F2 plants; only recombinant haplotypes between orange wheat blossom midge susceptible parent for fine mapping of Sm1 and does not contain the associated
resistant (R) and susceptible (S) genotypes are shown. b, Oxford Nanopore NB-ARC domain. The experiment was replicated on four independent
long read confirmation of the Sm1 gene candidate in the CDC Landmark RQA biological samples for each condition. d, Distribution of an Sm1 allele-specific
(left), and alternative haplotype in Chinese Spring (right). Vertical coloured PCR marker in a diverse panel of >300 wheat lines.
lines indicate sequence variants. c, Amplification of cDNA for the NB-ARC
Corresponding author(s): Curtis Pozniak
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection No software was used to collect data for this study.
Data analysis A multitude of software and databases were used in this study, all of which have been listed, cited, or provided. These include:
DeNovoMAGIC v3.0, W2RAP (no versions, https://github.com/bioinfologics/w2rap), LongRanger v2.1.6, GATK v3.8, R v3.6.1 and v3.0.2,
BLAT v3.5, BLAST v2.8 , MUSCLE v3.8, libsequence v1.8.3, EMBOSS v6.6.0, HMMER 3.1b2, PFAM v32.0, NLR-Annotator (no version,
https://github.com/steuernb/NLR-Annotator), Vmatch v2.3.0, TandemRepeatFinder v4.07b, LTRharvest genometools-1.5.9, HMMER
v3.0, MUMmer v3.23 (haplotype database) and v4 (all other analyses), HISAT v2.1.0, SNPrelate v3.11, BBTools/BBMap v38, ImageJ
v1.51n, minimap2 v2.13, FGENESH v2.6, NCBI Conserved Domain Search tool (no version, https://www.ncbi.nlm.nih.gov/Structure/cdd/
wrpsb), PROSITE release 2020_01, TMpred v25, STAR v2.6.0b., AUGUSTUS v3.2.3., GMAP v2017-06-20, EvidenceModeler v1.1.1, AHRD
v1.6, MCScanX v2.0, samtools v1.10, BEDtools v2.29, and custom data scripts (https://github.com/Uauy-Lab/pangenome-haplotypes;
http://people.beocat.ksu.edu/~jpoland/centromeres/).
Data
October 2018

All sequence reads have been deposited into the National Center for Biotechnology Information sequence read archive (SRA) (see Supplementary Table 1 for
accession numbers). Sequence reads for the RQAs, Th. ponticum, Ae. ventricosa and T. timopheevii have been deposited into the SRA (no. PRJNA544491) and ChIP-
1
seq short read-data used for centromere characterization is deposited as PRJNA625537. All Hi-C data has been deposited in the European Nucleotide Archive
(Supplementary Table 1). The RQAs and projected annotations are available for direct user download at https://wheat.ipk-gatersleben.de/. All RQA assemblies have

also been deposited at EBI with the following accession numbers: GCA_903993795; GCA_903993985; GCA_903993975; GCA_903994175; GCA_903994195;
GCA_904066035; GCA_903994155; GCA_903994165; GCA_903994185; GCA_903995565. These data will be syncrhonized across multiple platforms including NCBI
and at Ensembl Plants (https://plants.ensembl.org/index.html). Comparative analysis viewers are also online for synteny (https://
kiranbandi.github.io/10wheatgenomes/; http://10wheatgenomes.plantinformatics.io/) and haplotypes (http://www.crop-haplotypes.com/). Seed stocks of the
assembled lines are available at the UK Germplasm Resources Unit (https://www.seedstor.ac.uk/).
Life sciences study design

Sample size No statistical methods were used to establish sample size. The samples that were sequenced were selected to represent modern breeding
material from different continents that had known differences in pedigree and were known to carry different genes/traits/chromosomal
segments of interest.
Data exclusions All sequencing data generated was used in the genome assembly and analyses. Whenever possible, all data was included in the supporting
analyses. Data exclusion applies only to some of the subsequent supporting analysis, which was pre-established based on limitations in the
data. For example, we excluded the scaffolded assemblies from some analyses because the analyses required chromosome
pseudomolecules. We performed diversity analysis both with the spelt genome but also excluding the spelt genome because it is a different
species and is much more diverged and biased the results.
Replication In all analyses that support the genome assemblies, the number of replicates or iterations are indicated in materials and methods or
supplemental tables. In each case, all replications were successful and were used. The genome assemblies themselves were validated using
multiple methods (i.e. BUSCO, genetic maps, HiC, 10x Genomics, cytology, and comparions to Chinese Spring). The CDC Landmark assembly
was further validated using Oxford Nanopore long read sequencing. This helped validate the other approaches.
Randomization Randomization does not directly apply to the genome sequencing and assembly; however it applies to some of the supporting analyses. In
these cases, the group design and data seeding for computational analysis are described in the materials and methods and adhere to widely
accepted standards. For example, analysis of NLRs (Fig. 1c), 1 million random permutations were used. For the field experiments established
for phenotyping of Sm1, all samples were replicated and randomized using appropriate experimental designs.
Blinding Blinding does not apply to this study, as the study focuses on genome sequencing. This study focuses on plants genomics and the results of
the study are not impacted by the concealment of treatment, data, or groups.


Antibodies ChIP-seq
Clinical data
October 2018
Antibodies
Antibodies used Chromatin immunoprecipitation (ChIP) was performed ausing wheat CenH3 antibody (Koo et al., 2015). A antigen with the
peptide sequence ‘RTKHPAVRKTKALPKK’ corresponding to the N-terminus of wheat CENH3 was used to produce antibody
utilizing the custom-antibody production facility provided by the Thermo Fisher Scientific, Illinois, USA (abs@thermofisher.com).
A 0.396 mg of the antibody pellet was dissolved in 2 ml of PBS buffer, pH 7.4 resulting in 198 ng/uL of the working concentration.
Validation In the manuscript, we validate the antibody according to a previous study of Chinese Spring (Koo et al., 2015) and achieved near
2
identical results (Supplementary Table 12). Additional controls were used in the study where the antibody was substituted with
rabbit serum, which serves as nonspecific binding control in chromatin immunoprecipitation assay.

ChIP-seq
Data deposition
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.
Data access links The data for the project has been deposited at NCBI: PRJNA625537 and analysis files are available for download: http://
May remain private before publication. people.beocat.ksu.edu/~jpoland/centromeres/
Files in database submission BED files, delta files (MUMmer), data analysis scripts
Genome browser session Data for visualization is available at http://people.beocat.ksu.edu/~jpoland/centromeres/

(e.g. UCSC)
Methodology
Replicates NA. Samples were obtained from 2-week-old seedlings.
Sequencing depth Paired-end reads were generated at varying levels of read depth, data was deposited at NCBI (PRJNA625537).
Antibodies Wheat CenH3 antibody - see: Koo DH, Sehgal SK, Friebe B, Gill BS (2015) Structure and stability of telocentric chromosomes
in wheat. PLoS One 10: e0137747.
Peak calling parameters Reads mapped per 100kb bin were counted for each sample using BEDtools and output as a bed file. Scripts for data
analysis are provided at http://people.beocat.ksu.edu/~jpoland/centromeres/. Unlike studies involving transcription factors,
CENH3 ChIP-seq provides clear distinct peaks that are ~100 fold greater than background.
Data quality SAM output files from HISAT2 were converted to BAM, sorted and filtered for minimum alignment quality of 30 using
SAMtools.
Software Reads for each sample were aligned to each of the respective genome assemblies using HISAT2.Reads mapped per 100kb
bin were counted for each sample using BEDtools and output as a bed file. Scripts for data analysis are provided at http://
people.beocat.ksu.edu/~jpoland/centromeres/.
October 2018
3
Article
The barley pan-genome reveals the hidden

legacy of mutation breeding
https://doi.org/10.1038/s41586-020-2947-8 Murukarthick Jayakodi1,20, Sudharsan Padmarasu1,20, Georg Haberer2,

Venkata Suresh Bonthala2, Heidrun Gundlach2, Cécile Monat1, Thomas Lux2, Nadia Kamal2,
Daniel Lang2, Axel Himmelbach1, Jennifer Ens3, Xiao-Qi Zhang4, Tefera T. Angessa4,
Accepted: 9 September 2020 Gaofeng Zhou4,5, Cong Tan4, Camilla Hill4, Penghao Wang4, Miriam Schreiber6,
Lori B. Boston7, Christopher Plott7, Jerry Jenkins7, Yu Guo1, Anne Fiebig1, Hikmet Budak8,
Dongdong Xu9, Jing Zhang9, Chunchao Wang9, Jane Grimwood7, Jeremy Schmutz7,
Open access Ganggang Guo9, Guoping Zhang10, Keiichi Mochida11,12,13, Takashi Hirayama13, Kazuhiro Sato13,
Kenneth J. Chalmers14, Peter Langridge14, Robbie Waugh6,14,15, Curtis J. Pozniak3, Uwe Scholz1,
Check for updates
Klaus F. X. Mayer2,16, Manuel Spannagl2, Chengdao Li4,5,17 ✉, Martin Mascher1,18 ✉ & Nils Stein1,19 ✉
Genetic diversity is key to crop improvement. Owing to pervasive genomic structural

variation, a single reference genome assembly cannot capture the full complement of
sequence diversity of a crop species (known as the ‘pan-genome’1). Multiple
high-quality sequence assemblies are an indispensable component of a pan-genome
infrastructure. Barley (Hordeum vulgare L.) is an important cereal crop with a long
history of cultivation that is adapted to a wide range of agro-climatic conditions2. Here
we report the construction of chromosome-scale sequence assemblies for the
genotypes of 20 varieties of barley—comprising landraces, cultivars and a wild
barley—that were selected as representatives of global barley diversity. We catalogued
genomic presence/absence variants and explored the use of structural variants for
quantitative genetic analysis through whole-genome shotgun sequencing of
300 gene bank accessions. We discovered abundant large inversion polymorphisms
and analysed in detail two inversions that are frequently found in current elite barley
germplasm; one is probably the product of mutation breeding and the other is tightly
linked to a locus that is involved in the expansion of geographical range. This
first-generation barley pan-genome makes previously hidden genetic variation
accessible to genetic studies and breeding.
A staple food of ancient civilizations, today barley is used mainly for in gene content and copy number in the control of agronomic traits.
animal feed and malting. Barley is more adaptable to harsh environ- The concept of the pan-genome refers to a species-wide catalogue
mental conditions than its close relative wheat, and maintains an impor- of genic presence/absence variation (PAV)12, or more generally,
tant role in human nutrition in harsh climatic regions that include structural variation that affects (potentially non-coding) sequences
the Ethiopian and Tibetan highlands2. As in other crops, genomics of 50 or more base pairs (bp) in size. Although several methods of
has been a major driver of progress in barley genetics and breeding pan-genomic analysis that use short-read sequence data in the con-
in the past decade3. The first draft reference genome for barley4, and text of a single reference genome have been devised13, large and com-
its subsequent revisions5,6, have formed the basis for gene isolation7, plex genomes require multiple high-quality sequence assemblies to
compiling a single-nucleotide polymorphism (SNP) variation atlas capture and contextualize sequences that are absent in—or highly
for wild and domesticated germplasm8, and activating plant genetic diverged from—a single reference genotype14. Progress in sequenc-
resources9. At the same time, reduced-representation surveys of struc- ing and genome mapping technologies has only recently made pos-
tural variation10 and map-based cloning11 have implicated variation sible the fast and cost-effective assembly of tens of genotypes of
1
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany. 2Plant Genome and Systems Biology (PGSB), Helmholtz Center Munich, German Research
Center for Environmental Health, Neuherberg, Germany. 3Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada. 4Western Barley Genetics
Alliance, State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Western Australia, Australia. 5Agriculture and
Food, Department of Primary Industries and Regional Development, South Perth, Western Australia, Australia. 6The James Hutton Institute, Dundee, UK. 7HudsonAlpha, Institute for
Biotechnology, Huntsville, AL, USA. 8Montana BioAg Inc, Missoula, MT, USA. 9Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (ICS-CAAS), Beijing, China. 10College of
Agriculture and Biotechnology, Zhejiang University, Hangzhou, China. 11Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
12
Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan. 13Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan. 14School of
Agriculture, Food and Wine, University of Adelaide, Glen Osmond, South Australia, Australia. 15School of Life Sciences, University of Dundee, Dundee, UK. 16School of Life Sciences
Weihenstephan, Technical University of Munich, Freising, Germany. 17Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Jingzhou, China. 18German Centre for
Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany. 19Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
20
These authors contributed equally: Murukarthick Jayakodi, Sudharsan Padmarasu. ✉e-mail: C.Li@murdoch.edu.au; mascher@ipk-gatersleben.de; stein@ipk-gatersleben.de

Akashinriki
a Barke b
0.02 Golden Promise
HOR 10350 600
HOR 13821
0.01 HOR 13942 500
Barke position (Mb)

HOR 21599
PC4 (1.4%)
HOR 3081 400
0 HOR 3365
HOR 7552 300
HOR 8148
–0.01 HOR 9043 200
Hockett
Igri 100
–0.02 Morex
OUN333 0
–0.015 –0.005 0.005 RGT Planet
ZDM01467 0 200 400 600
PC3 (2%) ZDM02064 Morex position (Mb)
Fig. 1 | Chromosome-scale sequences of 20 representative barley principal components are shown in Extended Data Fig. 1a. b, Alignment of the
genotypes reveal large structural variants. a, We selected 20 barley pseudomolecules of chromosome 2H of the Morex and Barke cultivars. The
genotypes to represent the genetic diversity space, as revealed by PCA of inset zooms in on a 10-Mb inversion that is frequently found in germplasm from
genotyping-by-sequencing data of 19,778 domesticated varieties of barley 9. northern Europe. Co-linearity plots for all assemblies and chromosomes are
Principal component (PC)3 and PC4 are shown. The proportion of variance shown in Extended Data Fig. 3a.
explained by the principal components is indicated in the axis labels. Further
large-genome plant species, such as barley (haploid genome size of per assembly (Supplementary Table 1). However, we found pronounced
5 Gb)15. differences in the number of shared intact full-length LTR locations:
only 17 to 25% of full-length LTR locations present in the wild barley
B1K-04-12 were shared at 98% sequence identity and 98% alignment
Twenty barley reference genomes coverage with any domesticated genotype (Extended Data Fig. 4). By
The starting point for pan-genomics in barley was the comprehensive contrast, more closely related domesticated genotypes shared between
survey of species-wide diversity on the basis of the genome-wide geno- 53% and 67% of their full-length LTRs, consistent with previous reports
typing of more than 22,000 barley accessions, mainly from the German of rapid sequence turn-over in the non-coding space in large-genome
national gene bank9. To achieve a good representation of major barley plant species24,25.
gene pools, we selected accessions that were located in the branches De novo gene annotation using Illumina RNA sequencing and PacBio
of the first six principal components from the previously published Iso-Seq data (Supplementary Table 2) was performed for three geno-
principal component analysis (PCA)9 (Fig. 1a, Extended Data Fig. 1), types: Morex (which has previously been reported6), Barke and the Ethio-
reflecting the key determinants of population structure: geographical pian landrace HOR 10350 (Extended Data Fig. 5). Gene models defined
origin, row type and annual growth habit. In addition to these gene pool on the basis of these three assemblies were consolidated and projected
representatives, our panel included the reference cultivar Morex5, two onto the remaining 17 assemblies (Extended Data Fig. 5). Between 35,859
current or former elite malting varieties (RGT Planet and Hockett), two and 40,044 gene models were annotated by projection in each assem-
founder lines of Chinese barley breeding (ZDM01467 and ZDM02064), bly (Extended Data Table 1) with an average of 37,515 (s.d. = 896). The
Golden Promise and Igri (two genotypes with high transformation number of gene models was about 20% higher in the projections than in
efficiency16,17), Barke (a successful German variety and the parent of de novo annotations (Extended Data Fig. 5e), which indicates that some
several mutant and mapping populations18,19) and one wild barley of the models lack transcript support: possible explanations for the
(H. vulgare subsp. spontaneum (K. Koch) Thell.) genotype from Israel discrepancy are highly tissue-specific expression or pseudogenization.
(B1K-04-12, a desert ecotype collected at Ein Prat)20. The clustering of orthologous gene models yielded 40,176 orthologous
We constructed chromosome-scale sequence assemblies for groups. Of these, 21,992 occurred as a single copy in all 20 assemblies;
20 accessions (Extended Data Table 1). In brief, paired-end and 3,236 occurred in multiple copies in at least one of the 20 assemblies;
mate-pair Illumina short reads were assembled into scaffolds of 13,188 were absent from at least one assembly; and 1,760 were present
megabase (Mb)-scale contiguity (Extended Data Table 1). Scaffold in only one assembly. On average, 14.7% of gene models annotated in
assembly was done with Minia21 and SOAPDenovo22 following the TRI- each assembly occurred in tandem arrays that comprised two or more
TEX method6 (n = 16), DeNovoMagic from NRGene (n = 3) or W2rap23 adjacent copies. These results point to abundant genic copy-number
(n = 1). We used 10X Genomics Chromium linked-reads and chromo- variation between barley genotypes. Future transcriptomic studies will
some conformation capture (Hi-C) data to arrange scaffolds into chro- ascertain the effect of structural variants on gene expression.
mosomal pseudomolecules using the TRITEX pipeline6 (Extended Data
Table 1). A comparison of the short-read assembly of the Morex cultivar
to a long-read assembly of this genotype generated from PacBio long Pan-genome as a tool for genetics and breeding
reads showed high co-linearity at chromosomal scale, good concord- High-quality genome assemblies are a resource for ascertaining and
ance in gene space representation and similar power to detect PAV providing context to structural variants, which can then be genotyped in
(Extended Data Fig. 2), indicating that short-read assemblies are ame- a wider set of germplasm using low-coverage or reduced-representation
nable to pan-genomic analyses in barley. Although the assemblies of sequence data. We used two complementary approaches to detect
the 20 diverse accessions differed in contiguity and the extent of gap structural variation: assembly comparison and clustering of single-copy
sequence in the intergenic space, they had a similar representation of sequences to derive markers that can be scored in short-read data. We
reference gene models (Morex V2) and were highly co-linear to each used the Assemblytics26 software to discover PAV by pair-wise compari-
other at the whole-chromosome scale (Fig. 1b, Extended Data Fig. 3). son of 19 chromosome-scale assemblies to the Morex reference. We
A similar proportion (about 80%) of the assembled sequence of each identified 1,586,262 PAVs, ranging in size from 50 to 999,568 bp, and
genotype was composed of transposable elements, with an average of observed an enrichment for low-frequency variants (Extended Data
113,200 intact full-length long-terminal repeat retro-elements (LTRs) Fig. 6a, b). PAV density was higher in distal, gene-rich regions (Extended

Article
a
Single-copy pan-genome size (Mb)

650
600
550
500
R -12
R 9
2
U 2
O 33
M 50
R 4
M 43
R 7
1
M 1
R x
as 8
i
ri
Ba t
G GT ke
Pr e t
e
ik
t
H ore
is
H 159
O 55
O 36
H 206
H 146
O 08
82
Ak 814
Ig
ke
n
nr
r
9
ZD 03
ZD 90
om
de Pla
04
7
13
3
13
oc
hi
2
0
R
K-
H
R
R
O
O
O
B1
n
R
H
ol
b 20
–log10(P value)
Absent in Morex
15 Present in Morex
10
5
0
1H 2H 3H 4H 5H 6H 7H
c d 0
16,682 bp
Morex 7H –0.5
Normalized k-mer count

529.04 Mb 529.07 Mb
–1.0
–1.5
HOR 7552 7H
528.80 Mb –2.0
–2.5
Breakpoints Nud Deletion
–3.0
Hulled Naked
Fig. 2 | Single-copy pan-genome and use of PAVs in association mapping. respectively. c, The most highly associated PAV marker was a 16.7-kb region
a, Cumulative size of single-copy regions in genome assemblies of 20 barley that is deleted in the naked accession HOR 7552 and that contains the NUD
genotypes. The genotypes were ordered according to the size of their unique gene11. d, Allelic status of the NUD deletion in 196 domesticated varieties of
single-copy sequence. b, Genome-wide association scan for lemma adherence barley. Normalized single-copy k-mer counts within the 16.7-kb region are
on the basis of PAV markers. The black and red dots in the Manhattan plot shown for hulled (n = 160 genotypes) and naked varieties (n = 36 genotypes).
denote single-copy sequences that are present and absent in Morex,
Data Fig. 6c), which are characterized by higher nucleotide diversity sequence shared among all 20 genotypes amounted to 402.5 Mb,
and recombination rates8. A total of 5,446 out of 5,602 deletions longer whereas 235.9 Mb were variable (that is, absent or present in higher
than 5 kilobases (kb) found in Barke relative to Morex were mapped copy number in at least one assembly) (Fig. 2a). On average, each of the
genetically in the 90 recombinant inbred lines of the Morex × Barke 20 genotypes contained 2.9 Mb of single-copy sequence not present in
population19 with highly concordant positions (Spearman correla- any other assembly. As observed for transposable element divergence,
tion = 0.99) (Extended Data Fig. 6d), which provides support for the the wild barley B1K-04-12 had the highest amount of unique single-copy
accuracy of the detected polymorphisms. At least one member of 18,562 sequence (Extended Data Table 1).
(46%) groups of orthologous genes overlapped with structural vari- To test the suitability of the single-copy pan-genome for genetic analy-
ants discovered in the 20 sequence assemblies. As observed in other sis in a wider diversity panel without high-quality genome sequences,
plant species27, resistance-gene homologues containing NB-ARC and we collected whole-genome shotgun data (threefold coverage) for
protein kinase domains were frequently found among PAV genes (Sup- 200 domesticated and 100 wild varieties of barley (Supplementary
plementary Table 3). Table 4). The abundance of 160,716 single-copy clusters that overlap
Structural variants cover non-genic regions composed of repetitive structural variants was estimated by counting cluster-constituent
sequence, making it hard to establish orthologous relationships or k-mers (k = 31) in sequence reads of the diversity panel. In addition, we
the presence of specific alleles from short-read data only. To derive analysed genotyping-by-sequencing data of 19,778 gene bank accessions
quantitative estimates of the extent of pan-genomic variation and of domesticated barley9 using the same approach. Abundance estimates
as a tool for genetic analysis such as association scans, we focused based on k-mers (hereafter referred to as ‘pan-genome markers’) showed
on single-copy regions extracted from each of the 20 assemblies and that loci detected as single-copy sequence in one genome assembly can
clustered into a non-redundant set of sequences (hereafter referred to vary in copy number from zero to many in diverse germplasm (Extended
as the ‘single-copy pan-genome’) (Extended Data Fig. 7a). The average Data Fig. 7c). A PCA of pan-genome markers genotyped in whole-genome
cumulative size of single-copy sequence in each accession was 478 Mb shotgun and genotyping-by-sequencing data highlighted the same driv-
(that is, 9.5% of the assembly genome). The total size of non-redundant ers of global population structure as SNPs (Extended Data Fig. 7d–g). In
single-copy sequence was 638.6 Mb, represented by 1,472,508 clus- genome-wide association scans for morphological traits, pan-genome
ters with an N50 of 1,087 bp (Extended Data Fig. 7b). The single-copy markers revealed—with a good signal-to-noise ratio—peaks that are

a d
RGT Planet
RGT Planet
600
RGT Planet position (Mb)
Morex
Morex
500
Valticky Diamant
400
141.5 Mb
300
200
100
0
0 100 200 300 400 500 600
Morex position (Mb)
b
140
M × B genetic position (cM)

R × H genetic position (cM)
100 120
80 100
60 80
60
40
40
20 20
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
7H position (Mb) 7H position (Mb)
c
M × B recombination (cM Mb–1)

R × H recombination (cM Mb–1)
10 10
8 8
6 6
4 4
2 2
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
7H position (Mb) 7H position (Mb)
Fig. 3 | Identification and characterization of a large inversion on distances to recombination rates in the R × H (left) and M × B (right)
chromosome 7H. a, Alignment of the 7H pseudomolecules of the Morex and populations. A single marker per recombination block is shown. d, We
RGT Planet cultivars. b, Alignment of physical and genetic positions mapped in designed a PCR marker (Supplementary Figs. 1, 2a) to screen for the presence of
the RGT Planet × Hindmarsh (R × H) (left) and Morex × Barke (M × B) (right) the inversion in gene bank accessions that represent the Valticky and Diamant
populations. Red shading marks the inverted region. c, We converted genetic cultivars.
consistent with previous reports9 (Fig. 2b, Extended Data Fig. 8). Nota- inversions (more than 5 Mb in size) were prominent in the genome
bly, the pan-genome marker that was most highly associated with lemma alignments of our 20 assemblies (Fig. 1b, Extended Data Fig. 3a, c). Previ-
adherence covered the NUDUM (NUD) gene11 (Fig. 2c). All varieties of ous reports on segregating inversions in barley are anecdotal and have
naked barley—in which lemmas can be easily separated from grains—are focused on induced mutants32,33. To discover inversions in a broader set
thought to trace back to a single mutational event, deleting the entire of germplasm, we mined patterns of contact frequencies in Hi-C data of
NUD sequence11. Another putative knockout allele of NUD (nud1.g) that a diversity panel mapped to a single reference genome34. Among 69 bar-
contains a likely disruptive SNP variant was recently found in Tibetan ley genotypes (67 domesticated and 2 wild accessions) (Supplementary
barley28. All 36 naked accessions in our panel contained the known dele- Table 5), Hi-C-based inversion scans revealed a total of 42 events that
tion (Fig. 2d), indicating that broader sampling of barley diversity—with ranged from 4 to 141 Mb in size (mean size of 23.9 Mb) (Extended Data
a particular focus on centres of (morphological) diversity—is needed Fig. 9a). Most of these events occurred in the low-recombining peri-
to discover novel rare alleles by genomic analyses. centromeric regions of the barley chromosomes and segregated at low
Compared to reference-free approaches for k-mer-based frequency: 25 events were observed only once (Extended Data Fig. 9b, c,
genome-wide association scans such as AgRenSeq29, trait-associated Supplementary Table 6). We focus here on two notable examples: a
pan-genome markers are assigned with high precision to genomic frequent event on chromosome 2H and an inversion in the distal region
positions, and aligning sequence assemblies in their vicinity provides of the long arm of chromosome 7H.
immediate information about differences between haplotypes (Fig. 2c). The inversion in chromosome 7H detected in the RGT Planet cultivar
Furthermore, the reduction of marker number by implicit clustering was the largest event that segregated in our panel (141 Mb) (Fig. 3a).
of k-mers into single-copy loci allows the use of standard mixed linear In a biparental mapping population derived from a cross between
models30,31 to correct for genomic relatedness. RGT Planet and the non-carrier cultivar Hindmarsh (Fig. 3b), this
event repressed recombination in an interval that spanned 49 cM in
the genetic map of the Morex × Barke population19, which is isogenic
A map of polymorphic inversions for absence of the inversion (Fig. 3c, Supplementary Table 7). We also
Chromosome-scale sequence assemblies can reveal large-scale rear- observed a moderately distorted segregation (57% allele frequency,
rangements that are challenging to detect with other methods. Large χ2 = 4.88, P < 0.05) in favour of the Hindmarsh allele in this interval.

Article
a b
Inverted (domesticated)
0.010 Wild
Non-inverted (domesticated)
0.10
0.005
PC2 (2.4%)
(6.4%)
4%)
0 0.05
PC2 (6
–0.005
0
–0.010
2H inversion –0.05
–0.015 Other
–0.005 0 0.005 0.010 0.015 –0.05 0 0.05 0.10

PC1 (2.5%) PC1 (9.6%)
c
1,415 46 HC genes 4
458,374,957 468739147 468,739,151 468,752,788
Morex: 2H
458,368,537 458,373,542
3 1 2
24,516 14,369
444,665,384
Barke: 2H
444,635,865 444,640,868 454,475,094 454,489,463 454,503,097
44 HC genes
448,638
HvCEN
432,644 HvCEN
Fig. 4 | Analysis of a frequent inversion on chromosome 2H. a, A PCA in whole-genome shotgun data and located in the inverted regions were used.
showing the localization of inversion carriers in the diversity space of global c, Schematic of the inverted region. The HvCEN gene is closest to the
domesticated barley. The correspondence of PCA coordinates to correlates of breakpoint that is distal in Morex (distance of 449 kb) and proximal in Barke
population structure is shown in Extended Data Fig. 1. Red dots denote carriers (distance of 433 kb) assemblies. A total of 46 and 44 high-confidence (HC)
of the inverted haplotype (n = 87) in a panel of 200 domesticated varieties of genes were annotated in the Morex and Barke assemblies, respectively. The
barley. b, PCA for a diversity panel comprising 200 domesticated (red and yellow arrows (not drawn to scale) mark the positions of PCR primers to probe
green dots) and 100 wild varieties of barley (blue dots). SNP markers detected for the presence of the inversion (Supplementary Fig. 2c).
Recombination frequencies were increased in the flanking regions our panel of 200 domesticated and 100 wild varieties of barley indicated
of the inversion in the RGT Planet × Hindmarsh population relative to a single origin of the inverted haplotype (Fig. 4a, b, Supplementary
Morex × Barke, which suggests a compensatory mechanism to maintain Fig. 2c). The inversion occurred only among domesticated barley of
an average number of one-to-two crossovers per chromosome in the Western geographical origin9, indicating that it arose or has risen to
presence of large tracts of suppressed recombination35. high frequency after domestication. The inverted region contains
By focusing on the inversion breakpoints in the RGT Planet sequence 46 high-confidence genes in the Morex cultivar. The closest gene to
assembly, we designed a diagnostic PCR assay (Supplementary Fig. 2a, the inversion breakpoint—at 448 kb distance from the distal breakpoint
b, d) to rapidly genotype the presence of the inversion in 1,406 acces- in the non-carrier Morex—was HvCENTRORADIALIS (HvCEN)37 (Fig. 4c).
sions (Supplementary Table 8). The inverted haplotype occurred at Although induced mutants of HvCEN flower very early, natural variation
low frequency (1.3%) in the whole panel, but was found in many lines in HvCEN has previously been implicated in environmental adaptation
in the RGT Planet pedigree (Supplementary Fig. 3)—including com- to northern European climates37. All of the inversion carriers we ana-
mercially successful barley cultivars of past decades, such as Triumph, lysed had HvCEN haplotype III, which is associated with later flowering
Quench and Sebastian. The earliest cultivar that carried the inversion in spring barley varieties from northern Europe37,38. Further research is
was Diamant. As one of the donors of the semi-dwarf growth habit, required to determine whether the inversion close to HvCEN has direct
Diamant was a highly influential founder line of modern barley breed- functional consequences (for instance, by modulating HvCEN expres-
ing and traces back to a mutant induced by gamma irradiation of the sion) or whether it hitchhiked along with a tightly linked causal variant.
Czech cultivar Valticky36. We genotyped several gene bank accessions
and germplasm samples of both Valticky and Diamant. Notably, none
of the Valticky samples carried the inversion, whereas it segregated Discussion
in the Diamant samples (Fig. 3d). Quantitative trait loci mapping for The digital representation of the pan-genome can expand the repertoire
yield-related traits in the RGT Planet × Hindmarsh population did not of natural or induced sequence variation that is accessible to genetic
show signals on chromosome 7H (Supplementary Fig. 2e, Supplemen- analyses and breeding. Our comparison of 20 chromosome-scale
tary Table 9), consistent with selective neutrality of the inversion. This sequence assemblies has revealed pervasive variation in genes and
strongly suggests that mutation breeding in the 1960s has given rise to non-coding regions. Focusing on single-copy sequences, we trans-
a cryptic large inversion, which—unbeknownst to breeders—segregates lated this variation into scorable markers that are amenable to popu-
in elite varieties of barley. lation genetic analysis and association scans. A notable finding was
The second inversion we focused on spanned 10 Mb in the interstitial the prevalence of large (more than 5 Mb in size) inversion polymor-
region of chromosome 2H (Fig. 1b) and was present in 26 out of 69 Hi-C phisms in current elite germplasm. It is likely that the suppression
samples (Supplementary Table 8). Local PCA and haplotype analysis in of genetic recombination in inversion heterozygotes has manifested

itself in hard-to-explain patterns of long-range linkage and segrega- 18. Gottwald, S., Bauer, P., Komatsuda, T., Lundqvist, U. & Stein, N. TILLING in the two-rowed
tion distortion between elite lines in breeding programmes. Our map barley cultivar ‘Barke’ reveals preferred sites of functional diversity in the gene HvHox1.
BMC Res. Notes 2, 258 (2009).
of inversion polymorphisms will provide breeders with a point of 19. Mascher, M. et al. Anchoring and ordering NGS contig assemblies by population
reference to avoid—or interpret correctly—crosses between carriers sequencing (POPSEQ). Plant J. 76, 718–727 (2013).
20. Hübner, S. et al. Strong correlation of wild barley (Hordeum spontaneum) population
and non-carriers. We found abundant structural variation in 20 rep-
structure with temperature and precipitation variation. Mol. Ecol. 18, 1523–1536 (2009).
resentative barley genotypes, but individual events occurred at low 21. Chikhi, R., Limasset, A. & Medvedev, P. Compacting de Bruijn graphs from sequencing
frequency (Extended Data Figs. 6, 9). This observation, combined with data quickly and in low memory. Bioinformatics 32, i201–i208 (2016).
22. Luo, R. et al. SOAPdenovo2: an empirically improved memory-efficient short-read
the slow saturation of the single-copy pan-genome (Fig. 2a), moti-
de novo assembler. Gigascience 1, 18 (2012).
vate the genomic analysis of more genotypes to expand the barley 23. Clavijo, B. J. et al. An improved assembly and annotation of the allohexaploid wheat
pan-genome. The next phase of barley pan-genomics will focus on genome identifies complete families of agronomic genes and provides genomic
evidence for chromosomal translocations. Genome Res. 27, 885–896 (2017).
an augmented panel of domesticated and wild germplasm, working
24. Anderson, S. N. et al. Transposable elements contribute to dynamic genome content in
towards the long-term goal of high-quality genome sequences of all maize. Plant J. 100, 1052–1065 (2019).
barley plant genetic resources as part of a biodigital resource centre39,40. 25. Brunner, S., Fengler, K., Morgante, M., Tingey, S. & Rafalski, A. Evolution of DNA sequence
nonhomologies among maize inbreds. Plant Cell 17, 343–360 (2005).
26. Nattestad, M. & Schatz, M. C. Assemblytics: a web analytics tool for the detection of
variants from an assembly. Bioinformatics 32, 3021–3023 (2016).
Online content 27. Gordon, S. P. et al. Extensive gene content variation in the Brachypodium distachyon
pan-genome correlates with population structure. Nat. Commun. 8, 2184 (2017).
Any methods, additional references, Nature Research reporting sum- 28. Yu, S. et al. A single nucleotide polymorphism of Nud converts the caryopsis type of
maries, source data, extended data, supplementary information, barley (Hordeum vulgare L.). Plant Mol. Biol. Report. 34, 242–248 (2016).
acknowledgements, peer review information; details of author con- 29. Arora, S. et al. Resistance gene cloning from a wild crop relative by sequence capture
and association genetics. Nat. Biotechnol. 37, 139–143 (2019).
tributions and competing interests; and statements of data and code 30. Lipka, A. E. et al. GAPIT: genome association and prediction integrated tool.
availability are available at https://doi.org/10.1038/s41586-020-2947-8. Bioinformatics 28, 2397–2399 (2012).
31. Yu, J. et al. A unified mixed-model method for association mapping that accounts for
multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
1. Bayer, P. E., Golicz, A. A., Scheben, A., Batley, J. & Edwards, D. Plant pan-genomes are the 32. Ekberg, I. Cytogenetic studies of three paracentric inversions in barley. Hereditas 76, 1–30
new reference. Nat. Plants 6, 914–920 (2020). (1974).
2. Dawson, I. K. et al. Barley: a translational model for adaptation to climate change. New 33. Ramage, R. & Suneson, C. Translocation-gene linkages on barley chromosome 7. Crop
Phytol. 206, 913–931 (2015). Sci. 1, 319–320 (1961).
3. Stein, N. & Muehlbauer, G. J. The Barley Genome (Springer, 2018). 34. Himmelbach, A. et al. Discovery of multi-megabase polymorphic inversions by
4. International Barley Genome Sequencing Consortium. A physical, genetic and functional chromosome conformation capture sequencing in large-genome plant species. Plant J.
sequence assembly of the barley genome. Nature 491, 711–716 (2012). 96, 1309–1316 (2018).
5. Mascher, M. et al. A chromosome conformation capture ordered sequence of the barley 35. Ederveen, A., Lai, Y., van Driel, M. A., Gerats, T. & Peters, J. L. Modulating crossover
genome. Nature 544, 427–433 (2017). positioning by introducing large structural changes in chromosomes. BMC Genomics 16,
6. Monat, C. et al. TRITEX: chromosome-scale sequence assembly of Triticeae genomes 89 (2015).
with open-source tools. Genome Biol. 20, 284 (2019). 36. Bouma, J. & Ohnoutka, Z. Importance and Application of the Mutant ‘Diamant’ in Spring
7. Mascher, M. et al. Mapping-by-sequencing accelerates forward genetics in barley. Barley Breeding (IAEA, 1991).
Genome Biol. 15, R78 (2014). 37. Comadran, J. et al. Natural variation in a homolog of Antirrhinum CENTRORADIALIS
8. Russell, J. et al. Exome sequencing of geographically diverse barley landraces and wild contributed to spring growth habit and environmental adaptation in cultivated barley.
relatives gives insights into environmental adaptation. Nat. Genet. 48, 1024–1030 (2016). Nat. Genet. 44, 1388–1392 (2012).
9. Milner, S. G. et al. Genebank genomics highlights the diversity of a global barley 38. Bustos-Korts, D. et al. Exome sequences and multi-environment field trials elucidate the
collection. Nat. Genet. 51, 319–326 (2019). genetic basis of adaptation in barley. Plant J. 99, 1172–1191 (2019).
10. Muñoz-Amatriaín, M. et al. Distribution, functional impact, and origin mechanisms of 39. Mascher, M. et al. Genebank genomics bridges the gap between the conservation of crop
copy number variation in the barley genome. Genome Biol. 14, R58 (2013). diversity and plant breeding. Nat. Genet. 51, 1076–1081 (2019).
11. Taketa, S. et al. Barley grain with adhering hulls is controlled by an ERF family 40. Khan, A. W. et al. Super-pangenome by integrating the wild side of a species for
transcription factor gene regulating a lipid biosynthesis pathway. Proc. Natl Acad. Sci. accelerated crop improvement. Trends Plant Sci. 25, 148–158 (2020).
USA 105, 4062–4067 (2008).
12. Tettelin, H. et al. Genome analysis of multiple pathogenic isolates of Streptococcus Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
agalactiae: implications for the microbial “pan-genome”. Proc. Natl Acad. Sci. USA 102, published maps and institutional affiliations.
13950–13955 (2005).
13. Ho, S. S., Urban, A. E. & Mills, R. E. Structural variation in the sequencing era. Nat. Rev. Open Access This article is licensed under a Creative Commons Attribution
Genet. 21, 171–189 (2019). 4.0 International License, which permits use, sharing, adaptation, distribution
14. Danilevicz, M. F., Tay Fernandez, C. G., Marsh, J. I., Bayer, P. E. & Edwards, D. Plant and reproduction in any medium or format, as long as you give appropriate
pangenomics: approaches, applications and advancements. Curr. Opin. Plant Biol. 54, credit to the original author(s) and the source, provide a link to the Creative Commons license,
18–25 (2020). and indicate if changes were made. The images or other third party material in this article are
15. Monat, C., Schreiber, M., Stein, N. & Mascher, M. Prospects of pan-genomics in barley. included in the article’s Creative Commons license, unless indicated otherwise in a credit line
Theor. Appl. Genet. 132, 785–796 (2019). to the material. If material is not included in the article’s Creative Commons license and your
16. Coronado, M.-J., Hensel, G., Broeders, S., Otto, I. & Kumlehn, J. Immature pollen-derived intended use is not permitted by statutory regulation or exceeds the permitted use, you will
doubled haploid formation in barley cv. Golden Promise as a tool for transgene need to obtain permission directly from the copyright holder. To view a copy of this license,
recombination. Acta Physiol. Plant. 27, 591–599 (2005). visit http://creativecommons.org/licenses/by/4.0/.
17. Schreiber, M. et al. A genome assembly of the barley ‘transformation reference’ cultivar
Golden Promise. G3 10, 1823–1827 (2020). © The Author(s) 2020

Article
Methods with different size profiles were prepared by using differing ratios of
DNA to Ampure XP beads (Beckman Coulter, cat. no. A63882). Equi-
No statistical methods were used to predetermine sample size. The molar concentration of each fraction were pooled, and a minimum of
experiments were not randomized, and investigators were not blinded one microgram of double-stranded cDNA was used for Iso-Seq library
to allocation during experiments and outcome assessment. construction as per the PacBio library construction protocol. Two
additional libraries from pooled RNA tissues were prepared using cDNA
Library preparation, sequencing data generation and genome prepared from TeloPrime v.1.0 kit (Lexogen) following the manufac-
assembly of 20 diverse varieties of barley turer’s instructions. Libraries were quantified and sequenced on a
High-molecular-weight DNA was extracted from one-week-old seed- PacBio Sequel device at IPK Gatersleben. Data were analysed using
lings of 20 diverse barley accessions given in Supplementary Table 10, SMRTLink v.5.0 Isoseq v.1.0 pipeline or Isoseq3 pipeline (https://github.
using a previously described large-scale DNA extraction protocol41. com/PacificBiosciences/IsoSeq_SA3nUP/wiki/Tutorial:-Installing-
For the NRGene DeNovoMAGIC3.0 assemblies, 450-bp paired-end and-Running-Iso-Seq-3-using-Conda). The steps involved in Iso-Seq
(PE450) libraries of Morex, Barke, HOR 10350 and B1K-04-12 were pre- data analysis were the generation of circular consensus sequences,
pared at the Leibniz Institute of Plant Genetics and Crop Plant Research and then the classification of circular consensus sequence reads into
(IPK) Gatersleben. The 450-bp paired-end libraries for other acces- full-length non-chimeric reads and non-full length reads on the basis
sions, 800-bp paired-end libraries and mate-pair libraries of three of the presence of primer sequences and polyA sequences. Full-length
sizes were prepared and sequenced at the University of Illinois Roy J. non-chimeric reads were then clustered on the basis of sequence simi-
Carver Biotechnology Center. The 10X Genomics Chromium libraries larity to yield high- and low-quality isoforms. The data generated and
were prepared at the University of Saskatchewan Wheat Molecular method of library preparation are given in Supplementary Table 2.
Breeding Laboratoryand sequenced by Genome Quebec or prepared
and sequenced at the Roy J. Carver Biotechnology Center, using the Gene projections and repeat annotation
manufacturers’ recommendations. Published tethered chromosome Gene models for Morex, Barke and HOR 10350 were predicted using
conformation data for Morex, Barke, HOR 10350 and B1K-04-12 (ref. 42) transcriptome data (Supplementary Table 2) and protein homology
was used for scaffolding the respective genome. For the other acces- evidence, and derived by a previously described annotation pipeline5.
sions, in situ Hi-C libraries were prepared using a previously described High-confidence gene models from these accessions were aligned to
method43. Sequencing data generated from each of the libraries are pseudo-chromosomes of each accession separately using blat45. For
given in Supplementary Table 10. NRGene DeNovoMAGIC3.0 scaffold each genomic region identified by blat, additional alignments were
assemblies were provided for Barke, HOR 10350 and B1K-04-12. The performed by exonerate46 in its genomic neighbourhood ranging
10X Chromium, population sequencing (POPSEQ) and Hi-C data were between 20 kb upstream and 20 kb downstream of the match posi-
then used to prepare chromosome-scale assemblies using the TRITEX tion. A series of quality criteria was applied to select high-confidence
pipeline6 (commit: 7041ff2). For the other assemblies, the TRITEX pipe- gene models in each accession. The functional annotation for genes of
line was also used for contig assembly and scaffolding with mate-pair 20 accessions was carried out using the AHRD pipeline v.3.3.3 (https://
and 10X data (Extended Data Table 1). High-confidence gene models github.com/groupschoof/AHRD). Orthologous gene groups between
annotated on the Morex V2 reference6 and full-length cDNA sequences44 the twenty accessions were predicted using OrthoFinder47 v.2.3.1 with
were aligned to the assemblies to assess gene-space completeness with default parameters.
the parameters of ≥90% query coverage and ≥97% (≥90% for full-length
cDNA) identity. Repeat annotation
To obtain a consistent transposon annotation across all lines for trans-
Tissue collection and RNA extraction posons and tandem repeats, the same methods were applied to all 20
Plant material for the collection of tissues for RNA sequencing barley lines. Transposons were detected and classified by homology
(RNA-seq) and Iso-Seq was grown in the greenhouse at IPK Gatersle- search against the REdat_9.7_Poaceae section of the PGSB transposon
ben with day–night temperatures of 21 °C–18 °C. Embryonic tissue, library48. The program vmatch (http://www.vmatch.de, version 2.3.0)
leaves, roots, internode, inflorescence (5 mm) and developing seeds (5 was used for that purpose as a fast and efficient matching tool that is
and 15 days after pollination) were collected as previously described4, well-suited for such large and highly repetitive genomes. Vmatch was
snap-frozen in liquid nitrogen and stored at −80 °C until RNA extrac- run with the following parameters: identity ≥ 70%, minimal hit length
tions were performed. RNA was extracted from the collected tissues 75 bp, seed length 12 bp (exact command line: -d -p -l 75 -identity 70
using a Trizol extraction protocol4 and purified using Qiagen RNeasy -seedlength 12 -exdrop 5). To remove overlapping annotations, the
miniprep columns as per the manufacturer’s instructions. RNA quality vmatch output was filtered for redundant hits via a priority-based
was checked on Agilent RNA HS screen tape and RNA with RIN value approach. Higher scoring matches were assigned first and lower scoring
greater than 8 was used for RNA-seq and Iso-Seq library construction. hits at overlapping positions were either shortened or removed if they
were contained to ≥90% in the overlap or <50 bp of rest length remained.
RNA-seq library preparation and data generation The resulting transposon annotations are overlap-free, but disrupted
RNA-seq libraries were prepared from purified RNA using the TruSeq elements from nested insertions have not been defragmented into one
RNA sample preparation kit (Illumina) as per the manufacturer’s rec- element. Still-intact full-length LTR retrotransposons were identified
ommendation at IPK Gatersleben. Libraries were pooled at equimolar with LTRharvest49, a program that scans the genome for LTR retrotrans-
concentrations, quantified by qPCR and paired-end-sequenced on an poson specific structural hallmarks, such as long terminal repeats, RNA
Illumina HiSeq 2500 for 200 cycles. The data generated for each tissue cognate primer binding sites and target site duplications. LTRharvest
are given in Supplementary Table 2. (included in genometools 1.5.9) was run with the following param-
eter settings: ‘overlaps best -seed 30 -minlenltr 100 -maxlenltr 2000
Iso-Seq data generation and analysis -mindistltr 3000 -maxdistltr 25000 -similar 85 -mintsd 4 -maxtsd 20
Two libraries for each embryonic tissue RNA and pooled RNA from -motif tgca -motifmis 1 -vic 60 -xdrop 5 -mat 2 -mis -2 -ins -3 -del -3’. All
seven tissues (described in ‘Tissue collection and RNA extraction’) were candidates were annotated for PfamA domains using hmmer3 (http://
prepared for Barke and HOR 10350 using the PacBio Iso-Seq protocol. hmmer.org, version 3.1b2) and filtered to remove false positives. The
In brief, double-stranded RNA was synthesized using SMARTer PCR inner domain order served as a criterion for the LTR-retrotransposon
cDNA synthesis kit (Clontech; cat. no. 634925). Two fractions of cDNA superfamily classification into either Gypsy or Copia. In the cases of
insufficient domain information, the elements were assigned as still with sequences originating from 1 to 19 genotypes are considered as
undetermined. the variable genome.
Most of the transposons insert at random locations leading to novel
and usually unique sequence stretches at both borders around the Hi-C library preparation, sequencing and inversion calling
inserted element and the neighbouring original sequence. The de novo In situ Hi-C libraries were prepared from one-week-old seedlings of
detected full-length LTR set provides defined element borders, a pre- barley IPK core50 collection9 (Supplementary Table 5) based on a
requisite for the exact positioning of transposable element junctions. previously described protocol43 Sequencing, Hi-C raw data process-
We used 100-bp single transposable element junctions with 50 bp out- ing and inversion calling were performed as previously described34
side and 50 bp inside the element from both sides of the element and using the MorexV2 reference genome sequence assembly6. The break-
merged them to 200 bp joined junctions per element. Junctions from point regions were identified by pairwise genome alignment using
the reverse strand were reverse-complemented. The 200-bp joined Minimap2 (v.2.17)50 and PipMaker (http://pipmaker.bx.psu.edu/cgi-bin/
junctions from all 20 lines were clustered with vmatch dbcluster (http:// pipmaker?basic)57.
www.vmatch.de, version 2.3.0) at 98% identity and 98% mutual length
coverage (command-line parameters: 98 98 -e 2 -l 98 -d). About 97% Resequencing, SNP calling and PCA
of the clusters belonged to the 1:1 type with a maximum of 1 mem- Raw reads (Supplementary Table 4) were trimmed with cutadapt
ber per line and were used for the downstream analyses. Using the (v.1.15) and aligned to the MorexV2 genome assembly6 using Minimap2
above-described 200-bp joined junctions instead of full sequences (v.2.17)50. The alignments were sorted using Novosort (V3.06.05) (http://
reduces the amount of data for clustering to 2%, from about 10 kb to www.novocraft.com). BCFtools (v.1.8)58 was used to call SNPs and short
200 bp per full-length LTR element, thus allowing a sequence cluster- insertions and deletions (indels). The resulting VCF file was converted
ing of 2.2 million elements in the first place. By including sequence into Genomic Data Structure (GDS) format using SeqArray package59 in
information outside of the element, the repetitiveness of high-copy R to obtain a SNP matrix. Finally, hard filtering was applied to remove
transposable element families is removed and at the same time the SNPs having more than 10% missing data and heterozygosity. Previ-
syntenic context is provided even for elements located on chrUn (that ously generated genotyping-by-sequencing data9 were aligned to the
is, not assigned to chromosomal pseudomolecules). MorexV2 reference and identified SNPs using a previously described
variant calling pipeline9. PCAs were performed using snpgdsPCA()
PAV detection and validation function of the package SNPrelate60.
Owing to higher sensitivity in detecting deletions over insertions, a
paired genome alignment strategy was used in which each assembly RGT Planet × Hindmarsh mapping population
was aligned to reference genome Morex reciprocally by treating Morex A cross was made between RGT Planet (maternal plant) and Hindmarsh
as a query and reference using Minimap2 (v.2.17)50. From these two (pollen donor). In total, 38 F2 plants from the direct cross and 233 indi-
alignments, insertion and deletions were called using Assemblytics vidual heads from F3 seeds were progressed to the F6 generation by
(v.1.2.1)26. Then, only deletions were selected in both alignments and single seed descent method. The F6 recombinant inbred lines (RIL)
converted into PAVs with regard to Morex. In addition, a hard filter was (224 in total) were used for construction of a genetic linkage map.
used to discard PAVs containing more than 5% gaps (Ns) and nested Genomic DNA was extracted from the leaves of a single plant per RIL
PAVs. We used a previously described method51 to map deletions longer using the cetyl-trimethyl-ammonium bromide method. DNA qual-
than 5 kb in Barke relative to Morex using whole-genome shotgun data ity was assessed on 1% agarose gels and quantified using a NanoDrop
for 90 Morex × Barke recombinant inbred lines19. Mosdepth (v.0.2.9)52 spectrophotometer (Thermo Scientific NanoDrop Products). DNA was
was used for determining read depth in genomic intervals. diluted into 50 ng/μl and placed in a 96-well plate for PCR. DArT-seq
genotyping-by-sequencing was performed using the DArT-seq platform
k-mer-based genome-wide association (DArT PL) according to the manufacturer’s protocol (https://www.
PAVs overlapping with single copy regions were identified by BedTools diversityarrays.com/). In brief, 100 μl of 50 ng μl−1 DNA was sent to
(v.2.28.0)53. k-mer sequences with step size of 2 bp were retrieved DArT PL, and genotyping-by-sequencing was performed using com-
from single-copy regions residing within PAVs. The abundances of plexity reduction followed by sequencing on a HiSeq Illumina plat-
the extracted k-mer sequences were counted in sequence reads using form as previously described61 (Supplementary Table 9). Sequences
BBDuk (BBMap_37.93) (https://sourceforge.net/projects/bbmap/). flanking polymorphisms detected by DArT-seq were aligned against
k-mer counts were obtained for whole-genome shotgun data of the MorexV2 genome assembly to determine their physical positions
300 diverse varieties of barley generated in the present study and (Supplementary Table 7).
previously published genotyping-by-sequencing data9. k-mer counts
were imported into R (v.3.5.1)54 and normalized for differences in read Field experiments and phenotypic data
depth between samples. The normalized k-mer counts were then used Field experiments were conducted at six sites: Gibson, Western Australia
for genome-wide association scans using GAPIT330 and PCA using stand- (WA, −33.612176, 121.798438); Williams, Western Australia (−33.577668,
ard R functions. 116.734934); Wongan Hills, Western Australia (−30.848953, 116.756461);
Merredin, Western Australia (−31.487009, 118.229668); South Perth,
Construction of single-copy pan-genome Western Australia (−31.991186, 115.887944); and Shepperton, Victoria
To identify single-copy regions in each genome, genomic regions cov- (−36.487551, 145.388470). The distance between South Perth and Shep-
ered by 31-mers occurring more than once were masked using BBDuk perton is over 3,300 km. The Merredin site is located inland and receives
(BBMap_37.93)55. Based on masking, single-copy regions in each assem- little rainfall, whereas the Gibson site receives a high amount of rainfall:
bly were obtained in .bed format and subsequently related sequences the other sites are in between. The experimental design for field trial sites
were retrieved using BEDTools (v2.28.0)53. Single-copy sequences from was performed as previously described62. In brief, all regional field trials
all the assemblies were combined to perform an all-against-all blast (partially replicated design) were planted in a randomized complete
search. The blast results were filtered (>90% identity and minimum block design using plots of 1 by 5 m2, laid out in a row–column format
80% alignment length) and then clustered using the igraph package56. A and the middle 3 m was harvested for grain yield. Field trials in South
representative from each cluster (the largest contained sequence) was Perth and Shepperton were conducted using double rows with a 40-cm
selected and used for estimating pan-genome size. Clusters shared by distance within and between rows, owing to space constraints. Seven
all the 20 accessions are referred to as the core genome, and clusters control varieties were used for spatial adjustment of the experimental
Article
data. Measurements were taken at each plot of each field experiment and identity ≥90% were considered. Whole-genome assemblies were
in the study to determine flowering time (days to Zadoks stage (ZS)49), done with Minimap2. Structural variant calling with Assemblytics
plant height and grain yield. In brief, heading date was recorded as the (v.1.2.1)26 (Morex TRITEX versus Morex CLR; Morex CLR versus Barke)
number of days from sowing to 50% awn emergence above the flag leaf and extraction of single-copy regions were done as described in ‘PAV
(ZS49), as a proxy for flowering time. Plant height was determined by detection and validation’.
estimating the average height from the base to the tip of the head of all
plants in each plot. Grain yield (kg ha−1) was determined by destructively Reporting summary
harvesting all plant material from each plot to separate the grain, and Further information on research design is available in the Nature
then determining grain mass. Grain yield data of the field experiments, Research Reporting Summary linked to this paper.
as well as plant height and heading data, were analysed using linear
mixed models in ASReml-R (https://www.vsni.co.uk/software/asreml-r/)
to determine best linear unbiased predictions or best linear unbiased Data availability
estimations for each trait for further analysis. Local best practices for All raw sequence data collected in this study and sequence assemblies
fertilization and disease control were adopted for each trial site. have been deposited at the European Nucleotide Archive (ENA). Acces-
sion codes for raw data and assemblies are listed in Supplementary
Quantitative trait loci (QTL) mapping Tables: Supplementary Table 14 (assemblies), Supplementary Table 10
Software MapQTL6 was used for the QTL analysis63. The genotypic (assembly raw data), Supplementary Table 4 (whole-genome shotgun
data, phenotypic data and genetic map were formatted and imported sequencing), Supplementary Table 5 (Hi-C) and Supplementary Table 9
to MapQTL6. Interval mapping was conducted for each trait, and then (DArT-seq). Assemblies, annotations and analysis results were depos-
the markers with a logarithm of odds (LOD) value of above 3.0 were ited under a DOI in the PGP repository68 using the e!DAL submission
selected as cofactors. Multiple QTL model mapping was performed to system69 and are accessible under the URL https://doi.org/10.5447/
re-calculate the QTL. If the markers with the highest LOD value were ipk/2020/24. Assemblies and gene annotations can also be downloaded
inconsistent with the cofactor markers, then the new markers were from https://barley-pangenome.ipk-gatersleben.de. The Barley Pedi-
selected as cofactors and re-calculated. The QTL results and charts gree Catalogue is available at http://genbank.vurv.cz/barley/pedigree/.
were exported from the software.
Long-read sequence assembly of the Morex cultivar Code availability

PacBio libraries were constructed using SMRTbell Template Prep Kit Source code is released in a public Bitbucket repository, at https://
1.0 and sized on a SAGE Blue Pippin instrument 20–80 kb. Sequenc- bitbucket.org/ipk_dg_public/barley_pangenome/.
ing was performed on Sequell II device at the HudsonAlpha Institute
using V.1.0 chemistry and 10-h movie time. Data were generated from 41. Dvorak, J., McGuire, P. E. & Cassidy, B. Apparent sources of the A genomes of wheats
inferred from polymorphism in abundance and restriction fragment length of repeated
a total of five SMRT cells, yielding 604 Gb of raw sequence reads. A nucleotide sequences. Genome 30, 680–689 (1988).
total of 520.72 Gb of this set (104.15×) was used for assembly (Sup- 42. Himmelbach, A., Walde, I., Mascher, M. & Stein, N. Tethered chromosome conformation
plementary Tables 11, 12). Previously published Illumina short-read capture sequencing in Triticeae: a valuable tool for genome assembly. Bio Protoc. 8,
e2955 (2018).
data (ERR3183748 and ERR31837496 (Supplementary Table 12)) was 43. Padmarasu, S., Himmelbach, A., Mascher, M. & Stein, N. in Plant Long Non-Coding RNAs
used for polishing and error correction. Before use, Illumina fragment (eds Chekanova, J. & Wang, H.-L.) 441–472 (Springer, 2019).
44. Matsumoto, T. et al. Comprehensive sequence analysis of 24,783 barley full-length cDNAs
reads were screened for phix contamination. Reads composed of >95%
derived from 12 clone libraries. Plant Physiol. 156, 20–28 (2011).
simple sequences were removed. Illumina reads shorter than 50 bp 45. Kent, W. J. BLAT—the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
after trimming for adaptor and quality (q < 20) were removed. The final 46. Slater, G. S. C. & Birney, E. Automated generation of heuristics for biological sequence
comparison. BMC Bioinformatics 6, 31 (2005).
read set consists of 605,178,701 reads, representing a total of 43.17×
47. Emms, D. M. & Kelly, S. OrthoFinder: solving fundamental biases in whole genome
of high-quality Illumina bases. The initial assembly was generated by comparisons dramatically improves orthogroup inference accuracy. Genome Biol. 16,
assembling 32,743,478 PacBio reads (104.15× sequence coverage) using 157 (2015).
48. Spannagl, M. et al. PGSB PlantsDB: updates to the database framework for comparative
MECAT (v.1.1)64 and subsequently polished using Arrow65. This produced
plant genome research. Nucleic Acids Res. 44, D1141–D1147 (2016).
an initial assembly of 1,577 scaffolds (1,577 contigs), with a contig N50 49. Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for
of 10.4 Mb, 987 scaffolds larger than 100 kb and a total genome size of de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
50. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34,
4,139.8 Mb (Supplementary Table 13).
3094–3100 (2018).
A first round of breaking chimeric scaffolds was done using the POP- 51. Gutierrez-Gonzalez, J. J., Mascher, M., Poland, J. & Muehlbauer, G. J. Dense
SEQ genetic map19 to identify contigs bearing markers from distant genotyping-by-sequencing linkage maps of two synthetic W7984×Opata reference
populations provide insights into wheat structural diversity. Sci. Rep. 9, 1793 (2019).
genomic regions. A total of 17 misjoins were identified and resolved.
52. Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and
Homozygous SNPs and indels were corrected in the release consensus exomes. Bioinformatics 34, 867–868 (2018).
sequence using a subset of about 30× of the Illumina reads described 53. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic
features. Bioinformatics 26, 841–842 (2010).
above in this section. Reads were aligned using BWA-MEM66. Homozy- 54. R Core Team. R: A Language and Environment for Statistical Computing
gous SNPs and indels were discovered with GATK’s UnifiedGenotyper http://www.R-project.org (R Foundation for Statistical Computing, 2013).
tool67. A total of 59 homozygous SNPs and 15,759 homozygous indels 55. Bushnell, B. BBMap: A Fast, Accurate, Splice-aware Aligner (Lawrence Berkeley National
Laboratory, 2014).
were corrected. After these correction steps, the assembly contains 56. Csardi, G. & Nepusz, T. The igraph software package for complex network research.
4,139.7 Mb of sequence, consisting of 1,594 contigs with a contig N50 InterJournal Complex Syst. 1695, 1–9 (2006).
of 10.2 Mb. A second round of chimaera breaking by inspecting Hi-C 57. Schwartz, S. et al. PipMaker—a web server for aligning two genomic DNA sequences.
Genome Res. 10, 577–586 (2000).
contact matrices as described in the TRITEX pipeline6. Published Hi-C 58. Li, H. A statistical framework for SNP calling, mutation discovery, association mapping
data of the Morex cultivar was used5. Corrected contigs were arranged and population genetical parameter estimation from sequencing data. Bioinformatics 27,
into pseudomolecules using TRITEX. 2987–2993 (2011).
59. Zheng, X. & Gogarten, S. SeqArray: big data management of genome-wide sequence
variants. R package version 1.10.6 https://github.com/zhengxwen/SeqArray (accessed
Comparison of PacBio continuous long read (CLR) and TRITEX January 2017).
assemblies of the Morex cultivar 60. Zheng, X. et al. A high-performance computing toolset for relatedness and principal
component analysis of SNP data. Bioinformatics 28, 3326–3328 (2012).
Full-length cDNA sequences44 were aligned to the assemblies to assess 61. Akbari, M. et al. Diversity arrays technology (DArT) for high-throughput profiling of the
gene space completeness. Only alignments with query coverage ≥90% hexaploid wheat genome. Theor. Appl. Genet. 113, 1409–1420 (2006).
62. Hill, C. B. et al. Hybridisation-based target enrichment of phenology genes to dissect Agriculture Research System (CARS-05) and the Agricultural Science and Technology
the genetic basis of yield and adaptation in barley. Plant Biotechnol. J. 17, 932–944 Innovation Program to C.W. and G.G. Support for 10X sequencing was provided by a
(2019). research grant from Genome Canada and Genome Prairie to C.P. and J.E.; and by the
63. Van Ooijen, J. MapQTL 5, Software for the Mapping of Quantitative Trait Loci in Natural Science Foundation of China (31620103912) and the National Key R&D Program of
Experimental Populations (Kyazma, 2004). China (2018YFD1000706) to G.Z. We acknowledge support from the European Research
64. Xiao, C. L. et al. MECAT: fast mapping, error correction, and de novo assembly for Council (ERC Shuffle, project identifier: 66918) to R.W.
single-molecule sequencing reads. Nat. Methods 14, 1072–1074 (2017).
65. Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT Author contributions N.S. and M.M. designed the study. N.S. coordinated experiments and
sequencing data. Nat. Methods 10, 563–569 (2013). sequencing. M.M. supervised sequence assembly. M. Spannagl and K.F.X.M. supervised
66. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. annotation. U.S. supervised data management and submission. S.P., A.H., J.E., D.X., L.B.B.
Preprint at https://arxiv.org/abs/1303.3997 (2013). and J.G. performed sequencing experiments. M.J., C.M., Y.G., C.P., J.J. and J.S. performed
67. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing sequence assembly. M.J. performed structural variation and genome-wide association scan
next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). analysis. A.F. submitted sequence data. G.H., T.L., H.G., V.S.B., N.K. and D.L. annotated and
68. Arend, D. et al. PGP repository: a plant phenomics and genomics data publication analysed genes and transposable elements. S.P., M.J., X.-Q.Z., T.T.A., G. Zhou, C.T., C.H., P.W.,
infrastructure. Database (Oxford) 2016, baw033 (2016). M.M. and C.L. analysed polymorphic inversions. H.B., J.G., J.S., J.Z., C.W., G.G., G. Zhang,
69. Arend, D. et al. e!DAL—a framework to store, share and publish research data. BMC K.M., T.H., K.S., K.J.C., P.L., C.J.P., C.L., M. Schreiber, R.W. and N.S. contributed sequence
Bioinformatics 15, 214 (2014). data. M.J., S.P., C.L. and M.M. wrote the paper with input from all co-authors.
Acknowledgements We thank M. Knauft, I. Walde and S. König for technical assistance; Competing interests The authors declare no competing interests.
D. Schüler for sequence data management; J. Bauernfeind, T. Münch and H. Miehe for IT
administration; D. Arend for help with data submission; M. Bayer for advice on Additional information
transcriptome analysis; and M. Herz for pedigree information. This research was supported Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
by grants from the German Federal Ministry of Education and Research to N.S., M.M., U.S., 2947-8.
M.S. and K.F.X.M. (SHAPE, FKZ 031B0190), to U.S. and K.F.X.M. (de.NBI, FKZ 031A536) and to Correspondence and requests for materials should be addressed to C.L., M.M. or N.S.
N.S. (COBRA, FKZ 031A323A); the Australian Grain Research and Development Cooperation Peer review information Nature thanks Victor Albert, Scott Jackson and the other, anonymous,
(9176507) to C.L., K.C., P.L. and P.W.; JST CREST Japan (no. JPMJCR16O4 to K.M. and T.H.); reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are
JST Mirai Program Japan (no. 18076896 to K.S.); the National Key R&D Program of China available.
(2018YFD1000701 and 2018YFD1000700) to D.X. and J.Z.; by funding from the China Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Pan-genome selection in the global barley diversity pan-genome selection (first row), or according to geographic origin (second
space. PCA with genotyping-by-sequencing data of 19,778 varieties of row), row type (third row) or annual growth habit (fourth row). The proportion
domesticated barley sampled from the gene bank of the IPK9. The first six of variance explained by the principal components is indicated in the axis
principal components are shown. Samples are coloured to highlight the labels of the first row. The map was created with the R package mapdata54.
Extended Data Fig. 2 | Comparison between long-read and short-read between Morex PacBio CLR and Morex V2. d, Structural variants between
assemblies of the Morex cultivar. a, Co-linearity between Morex V2 Morex V2 and Morex PacBio CLR assemblies as detected and classified by
(short-read) assembly and the Morex PacBio CLR assembly at the Assemblytics. e, PAVs between Barke and the Morex V2 and Morex CLR
pseudomolecule level. b, Summary statistics of the Morex PacBio CLR assemblies.
assembly and Morex V2 assembly. c, Alignment of NUDUM locus (16 kb)
Article
Extended Data Fig. 3 | Assessment of contiguity and completeness in annotation and full-length cDNAs (28,622 full-length cDNAs) in each assembly.
20 genome assemblies. a, Whole-genome alignments of assemblies of Alignments with less than 90% query coverage and 97% (less than 90% for
19 diverse barley accessions to the Morex V2 reference assembly. b, Alignment full-length cDNAs) identity were discarded. c, Whole-genome alignments show
summary of full-length coding sequences (32,878) from the MorexV2 some examples of large chromosomal inversions identified using Hi-C data.
Extended Data Fig. 4 | Pairwise shared syntenic full-length LTR locations. cultivars. The highest similarity is found between the Barke and RGT Planet
The wild variety B1K-04-12 is set apart as an outgroup, as it shares only 19–26% cultivars (67% shared full-length LTRs).
of its still-intact full-length LTR positions with the other landraces and
Article
Extended Data Fig. 5 | Gene projection and transposable element d, Summary of gene projections and transposable element annotation in
annotation. a, Schematic of the gene projection workflow. TE, transposable 20 accessions. e, Comparison between de novo annotations and gene
element. b, Pipeline for annotation and removing transposable elements. projections for three genotypes. Reported counts refer to
c, Steps to identify tandemly arrayed gene (TAG) clusters in each assembly. non-transposable-element genes.
Extended Data Fig. 6 | Summary of PAVs detected in pan-genome d, Co-linearity between physical position of PAVs detected between the Morex
assemblies. a, Size distribution of PAVs. b, Number of PAVs between and Barke cultivars, and mapped genetically in the POPSEQ population.
20 genome assemblies. c, Distribution of PAVs along the barley genome.
Article
Extended Data Fig. 7 | Analysis of the single-copy pan-genome. a, Pipeline of PCA on the basis of PAV and SNP variants in whole-genome shotgun data of
used to select single-copy k-mers in PAVs as markers for genome-wide 200 diverse accessions (d, e) and 19,778 varieties of domesticated barley 9 (f, g).
association scan analysis. b, Summary of single-copy sequence in 20 genome Top panels show PCA results from 160,716 PAVs; bottom panels show PCA
assemblies and results of their clustering. c, Copy number of single-copy results from 779,503 of genotyping-by-sequencing SNPs. The accessions are
sequences in a diversity panel comprising 200 domesticated and 100 wild coloured according to geographical origin and row type (using the colour code
accessions. Frequency ranges from blue (low) to red (high). d–g, Comparison defined in Extended Data Fig. 1).
Extended Data Fig. 8 | PAV-based genome-wide association scans using domesticated barley. b, PAV-based genome-wide association scan results for
whole-genome shotgun and genotyping-by-sequencing data. a, Manhattan these traits using genotyping-by-sequencing data from 1,000 diverse varieties
plots of PAV-based genome-wide association scans for morphological traits, of domesticated barley collected from the gene bank of the IPK9. The 200
including adherence of grain hull, row type, length of rachilla hairs and awn varieties of barley used for whole-genome shotgun sequencing are a subset of
roughness, using whole-genome shotgun data from 200 diverse varieties of the 1,000 genotyping-by-sequencing genotypes.
Article
Extended Data Fig. 9 | Characterization of large inversions in barley. (n = 90 genotypes). c, Number of inversions present as singletons or shared
a, Inversion size distribution. b, Recombination in inverted regions. between two or more accessions on each chromosome.
Recombination rate was determined in the Morex × Barke RIL population19
Extended Data Table 1 | Summary statistics of 20 pan-genome assemblies and annotation
#Chromosomes 1H to 7H.
§Non-transposable element models or transposable-element filtered.
12345656762589
5653 17425
!"#$"%&'()*+, 3$71($"=C&($"C&4)=)"%#&-$
-&('!#&(#./&'()*+,>'%nonpnp
03&('0!&(4)5$$")%1 ' 22& /
($2!6()!#'4$.$7$(/8()59()&(5!'.7$):;)$82!6$# ('4('84"$("4/&"#(&"!&"4/
$"!($"%:<8'()$"82&($""3&('0&4)!7$4$=>'()?08 &"#()@#$($&7A7$4/)497$(:
1(&($($4
<&77(&($($4&7&"&7/=4"8$2()&(()8775$"%$(2 &!"($"()8$%'7%"#=(&.77%"#=2&$"(B(=C()# 4($":
"D& "8$2#
q ;)B&4(&2!7 $E*F+8&4)B!$2"(&7%'!D4"#$($"=%$6"& &#$4("'2.&"#'"$(82&'2"(
q > (&(2"("5)()2&'2"(5(&9"82#$($"4(&2!7 5)()() &2 &2!75& 2&'#!&(#7/
q; ) (&($($4&7((*+'#>3G5)()()/&"H(5H$##
IFJKLMNOONFLPQRPRLRSNTJULVQLUQRMWXVQULRNJQJKLVKLFYOQZLUQRMWXVQLONWQLMNO[JQ\LPQMSFX]TQRLXFLPSQL^QPSNURLRQMPXNF_
q >#4$!($"8&7746&$&( ((#
q >#4$!($"8&"/& '2!($" 44($"='4)& ((8"2&7$(/&"#&#`'(2"(82'7($!742!&$"
q >8'77#4$!($"8() (&($($4&7!&&2($"47'#$"%4"(&7("#"4/*:%:2&"+().&$4($2&( *:%:% $"488$4$"(+
>3G6&$&($"*:%:(&"#&##6$&($"+& 4$&(#($2&( 8'"4(&$"(/*:%:4"8$#"4$"(6&7+
q <"'7 7)/!()$(($"%=()(((&($($4*:%:a=P=W+5$()4"8$#"4$"(6&7=884($E=#% 88#2&"#b6&7'"(#
cXdQLbLdYJTQRLYRLQ\YMPLdYJTQRLeSQFQdQWLRTXPYVJQ_
q <f&/$&"&"&7/$=$"82&($""()4)$48!$&"#C&964)&$"C"(&7 (($"%
q <)$&4)$4&7&"#42!7B#$%"=$#"($8$4&($"8()&!!!$&(7678((&"#8'77!($"%8'(42
q @($2&( 8884($E *:%:)"gU=A&"gW+=$"#$4&($"%)5()/54&74'7&(#
ITWLeQVLMNJJQMPXNFLNFLRPYPXRPXMRLhNWLVXNJNiXRPRLMNFPYXFRLYWPXMJQRLNFLOYFKLNhLPSQL[NXFPRLYVNdQ_
18(5&&"#4#
A7$4/$"82&($"&.'(&6&$7&.$7$(/842!'(4#
G&(&4774($" 3 8(5&5& '#8#&(&4774($":
G&(&&"&7/$ ;0r;@s& 2.7/!$!7$"*422$(,tpuv88n+=30k"G36C&%$46w:p=C$"$2&!n*6$"n:vt+=A$!2&9*7&npvvHpxHvnHpv+=
kC>A*7&npvxHptHpu+=1>C(7*6v:x+=36(*yw:pz:po+=f<(7*6v:x+=0*6w:o:v+=> 2.7/($4*6v:n:v+=rm!$!7$"
*1C0;-$"96o:prm6v:p+=.7&(*6woBv+=B"&(*6n:n:p+=>l0G*6w:w:w+={()<$"#*6n:w:v+=62&(4)*n:w:p+=%"2(7*v:o:|+=
ffG'9*ffC&!}wt:|w+=k>Ar;*6w+=$%&!)*6v:v:n+=13A07&(*6v:vp:n+=C&!~;-*6z+=C@>;*6v:v+=>5*6n:w:w+
<2&"'4$!('($7$E$"%4'(2&7%$()2 8(5&()&(&4"(&7(()&4).'("(/(#4$.#$"!'.7$)#7$(&('=8(5&2'(.2&#&6&$7&.7(#$(D6$5:
j ("%7/"4'&%4##!$($"$"&422'"$(/!$(/*:%:k$(l'.+:1()3&('0&4)%'$#7$" 8'.2$(($"%4#? 8(5&88'()$"82&($":
G&(&
A7$4/$"82&($"&.'(&6&$7&.$7$(/8#&(&
>772&"'4$!(2'($"47'#&#&(&&6&$7&.$7$(/(&(2"(:;)$ (&(2"()'7#!6$#()8775$"%$"82&($"=5)&!!7$4&.7,
H>44 $"4#='"$m'$#"($8$=5.7$"98!'.7$47/&6&$7&.7#&(&(
H>7$(88$%' ()&()&6& 4$&(#&5#&(&
H>#4$!($"8&"/($4($" "#&(&&6&$7&.$7$(/

>77&5 m'"4#&(&4774(#$"()$ ('#/&"# m'"4& 2.7$ )&6."#!$(#&(()@'!&"3'47($#>4)$6*@3>+:>44 $"4# 8&5#&(&

&"#& 2.7$ &7$(#$" '!!72"(&/(&.7,1'!!72"(&/;&.7vu*& 2.7$+=1'!!72"(&/;&.7vp*& 2.7/&5#&(&+=1'!!72"(&/;&.7u
*jk1 m'"4$"%+=1'!!72"(&/;&.7o*l$H+=1'!!72"(&/;&.7|*G>;Hm+:> 2.7$=&""(&($" &"#&"&7/$'7(5#!$(#'"#&#$%$(&7
.`4($#"($8$*G{r+$"()AkA!$(/zt'$"%()G>-'.2$$" /(2zx&"#&44 $.7'"#()0-)((!,DD#B:#$:%Dvp:ouutD$!9DnpnpDnu:
> 2.7$ &"#%"&""(&($" 4&"&7.#5"7&##82)((!,DD.&7/H!&"%"2:$!9H%&(7.":#:;)f&7/A#$%&(&7%'$&6&$7&.7&()((!,DD
%".&"9:6'6:4ED.&7/D!#$%D:
0
<A7$&7#7H4((!)"4.$87$5(4 ! ($ " %
12345656762589
55653 17
)&($().(8$(8/'&4):r8/'&"('=&#()&!!!$&( 4($" .82&9$"%/'74($":
q -$8 4$"4 f)&6$'&7? 4$&74$"4 @47%$4&7=67'($"&/?"6$"2"(&74$"4
<&8"44!/8()#4'2"(5$()&774($"="&(':42D#4'2"(D"H!($"%H'22&/H87&(:!#8
->7$78('#$2'4$(#"$447"(")(!'#$"/# $% "
(6"5)"()#$47'$
$"%&($6:
7425
1&2!7 $E >44 $" 8%"2& 2.7/54)"((4
npv|3&('k"($4+:
6()#$6$(/*A>+!&48np=ppp#2($4&(#.&7/%"(/! *C$7"(&7:=
>44 $" 8m'"4$"%*jk1=l$H+54)".&#"( " )4 (#8$"#../C$
/ 7"(&7:
3 &2!7 $E4&74'7&($"5& !82#:1&2!7 54)"822&`%2!7&2%'! 6$#"($"A " >.4&'876&"48
.&7/%"($4*f&9=k7#"A2$=r%$+
G&(&B47'$" 33#&(&5B47'##:
0!7$4&($" >!&($&77/!7$4&(##$%"5& 2!7/#$$"8 "$7#($&7:3
3!7$4&($"5& #"$$"%
" "2& 2.7/&"# m'"4$"%:
0&"#2$E&($" <&$"7##($&75#"$$"
" &"#2$E#.749#$%":@B!$2"(&7.6&($" 5#"5$()'(!#8$"#%'!$"%:{()82 8
2$E&($"5"(76&"(((
)$ ('#/:
f7$"#$"% f7$"#(($"%5& "(#"&& $(5& "(76&"((! 7&"(274'7&%"($4&"#%"2$459:
0jj!m'$$"8(2&$"($%8 ! 4$8$42& ( $& 7 = / ( 2 & " #2

"82&'()&.'(2(/! 82&($&7=B!$2"(&7/(2 &"#2()# '#$$"2&
( ) #
" "/('#$:l=$"#$4&(5)()&4)2&($&7=
/(22()#7$(#$$76&"((/
'('#/:r8/'&"('$$8&7$($(2&!!7$ (/
'&4)=&#()&!!!$&( 4($".8 74($"%&!":
C&($&7?B!$2"(&7/(2 C()#
"D& r"676#$$"(
" ) ('#/ "D& r"676#$
$"(
" ) ('#/
q >"($.#$ q )rAHm
q @'9&/($44777$" q <754/(2(/
q A&7&"(7%/ q C0rH.&#"'$2&%$"%
q >"$2&7&"#()%&"$2
q l'2&"&4)!&($4$!&"(
q 7$"$4&7#&(&

Article
PIEZO2 in sensory neurons and urothelial

cells coordinates urination
https://doi.org/10.1038/s41586-020-2830-7 Kara L. Marshall1, Dimah Saade2, Nima Ghitani3, Adam M. Coombs1, Marcin Szczot3,
Jason Keller4,5, Tracy Ogata2, Ihab Daou1, Lisa T. Stowers4, Carsten G. Bönnemann2,
Alexander T. Chesler2,3 ✉ & Ardem Patapoutian1 ✉
Accepted: 22 July 2020
Published online: 14 October 2020

Henry Miller stated that “to relieve a full bladder is one of the great human joys”.
Check for updates Urination is critically important in health and ailments of the lower urinary tract cause
high pathological burden. Although there have been advances in understanding the
central circuitry in the brain that facilitates urination1–3, there is a lack of in-depth
mechanistic insight into the process. In addition to central control, micturition
reflexes that govern urination are all initiated by peripheral mechanical stimuli such as
bladder stretch and urethral flow4. The mechanotransduction molecules and cell
types that function as the primary stretch and pressure detectors in the urinary tract
mostly remain unknown. Here we identify expression of the mechanosensitive ion
channel PIEZO2 in lower urinary tract tissues, where it is required for low-threshold
bladder-stretch sensing and urethral micturition reflexes. We show that PIEZO2 acts
as a sensor in both the bladder urothelium and innervating sensory neurons. Humans
and mice lacking functional PIEZO2 have impaired bladder control, and humans
lacking functional PIEZO2 report deficient bladder-filling sensation. This study
identifies PIEZO2 as a key mechanosensor in urinary function. These findings set the
foundation for future work to identify the interactions between urothelial cells and
sensory neurons that control urination.
The mechanotransduction channel necessary for urinary reflexes feeling the need to void and therefore followed a voiding schedule.
remains unknown. Several ion channels have been implicated in uri- A healthy frequency is defined as five to six voids per day. Despite the
nary tract function in vivo5–7, but none have been shown to be required lack of normal sensory feedback, all patients had achieved continence
for micturition reflexes. Moreover, it is not clear which cells are the at the time of evaluation except for one nine year old. However, many
primary sensors: umbrella cells of the innermost layer in the urothelium patients reported sudden urge incontinence, where any delay in voiding
have been proposed to be mechanosensory8,9, but the bladder is also resulted in urinary accidents. Two individuals reported occasional noc-
innervated by mechanically sensitive afferents from dorsal root ganglia turnal enuresis, and four had stress incontinence caused by laughter,
(DRG)1,10. PIEZO2 is the primary mechanosensor that mediates touch, cough and/or postural changes, with one case being severe enough
proprioception and mechanical allodynia in mice11–15. Loss-of-function to require treatment. Several patients had a sensation of incomplete
mutations in PIEZO2 also resulted in complete deficits in these senses voiding and an irregular urinary stream. Three adults described a sen-
in humans13,16. Furthermore, PIEZO2 mediates interoceptive processes sation of pelvic heaviness when their bladder was full, and all three
such as lung-stretch sensing and baroreception in mice17,18, but intero- independently reported voiding by leaning over or using their hands
ceptive deficits have not been studied in humans who are deficient in to apply pressure to their lower abdomen. Overall, these data suggest
PIEZO2. As urination is driven by mechanical interoceptive reflexes, we that PIEZO2 has a key functional role in human urination.
investigated whether PIEZO2 is important for urination. We next carried out studies in mice to understand where and how
To understand how PIEZO2 contributes to urination in humans, PIEZO2 functions in the urinary tract. To test whether Piezo2 is present in
PIEZO2-deficient individuals (n = 12; 5–43 years of age) answered ques- bladder sensory neurons, we used RNA fluorescent in situ hybridization
tionnaires designed to capture pathology and validated against healthy (FISH) in DRG tissue taken from three mice after injection of cholera
control individuals to screen for voiding and elimination dysfunc- toxin B–Alexa Fluor 488 (CTB), a neuronal tracer, into the bladder wall
tion19 (Fig. 1). We also assessed urological history, previous medical (Fig. 2a). Out of 92 bladder-innervating neurons labelled with CTB, 75
evaluations and non-invasive bladder ultrasound scans (Supplementary expressed Piezo2 transcript (81.5%). Piezo2 transcript was also detected
Table 1). All patients reported decreased voiding frequency, as low as in a subset of bladder urothelial cells expressing Krt20 (Fig. 2b), a marker
once or twice daily, regardless of hydration status. Notably, the major- of umbrella cells that line the bladder lumen and have been proposed
ity of individuals reported that they could spend an entire day without to contribute to detection of bladder filling8. Seventy-four per cent of
1
Howard Hughes Medical Institute, Department of Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, CA, USA. 2National Institute of Neurological Disorders
and Stroke, National Institutes of Health, Bethesda, MD, USA. 3National Center for Complementary and Integrative Health, National Institutes of Health, Bethesda, MD, USA. 4Department of
Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, CA, USA. 5Present address: Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA.
✉e-mail: alexander.chesler@nih.gov; ardem@scripps.edu

Patient 1 2 3 5 7 9 10 12
Age 23 13 16 18 10 9 5 36
Sex F F M M F F F F
In a normal day I go to the
* 3–4 1–2 1–2 1–2 3–4 3–4 5–6†
washroom to pee (no. of times)
I pee in my underwear during
the day (days per week) 1 1
When I pee in my
underwear, it is Damp Damp
I feel that I have to rush to
the washroom to pee
I hold my pee by crossing
my legs or sitting down
It hurts when I pee
I wet my bed at night

I wake up to pee at night
3–4
(nights per month)
When I pee, it
stops and starts
I have to push or wait for
my pee to start
Fig. 1 | Urinary dysfunction in individuals deficient in PIEZO2. Patient (no pathology); blue, less than half of the time (pathology score of 1); yellow,
numbers correspond to those in Supplementary Table 1. Grey indicates a half of the time (pathology score of 2); orange, more than half of the time
neutral answer or one not indicating pathology. Urinary frequency information (pathology score of 3); red, every day or every night if night time is indicated in
is scored differently to the other questions, and is colour-coded according question (pathology score of 4). Asterisk indicates an unanswered question.
to the pathological score assigned to the answer in the questionnaire. Dagger indicates that the individual answered twice per day during clinical
Unless otherwise noted, the colour code indicates the following: grey, never interview.
umbrella cell nuclei were associated with Piezo2 transcript, and 12.6% of that correspond to micturition events (Fig. 3a, Extended Data Fig. 2b).
cells showed high expression of Piezo2 (1 s.d. above the mean). The den- Piezo2-knockout mice displayed irregular micturition timing (Fig. 3b,
sity of Piezo2-positive cells varied across the bladder. These results show Extended Data Fig. 2c) and, on average, longer intervals between blad-
that Piezo2 is expressed in two distinct cell types in the lower urinary der contractions that resulted in urination (Fig. 3c, Extended Data
tract and could function in detecting relevant mechanical stimuli. We Fig. 2h). Therefore, PIEZO2-deficient mice are less sensitive to blad-
confirmed that urothelial cells express other mechanosensory proteins der filling, as it takes more volume to initiate bladder contractions.
(Extended Data Fig. 1), but the functional deficits in PIEZO2-deficient Piezo2-knockout mice also displayed higher bladder pressures five
humans focused our work on this protein. seconds before contraction peaks (Fig. 3d, Extended Data Fig. 2i). In
We next used calcium imaging to determine whether bladder-stretch healthy animals, low pressures are maintained during bladder filling
responses in sensory neurons were dependent on Piezo2. Whether neu- because the detrusor muscle relaxes. The higher pressures that were
rons detect bladder stretch directly or downstream of urothelial cell observed before contractions suggest that this relaxation process or
activation, we expect this stimulus to cause calcium influx. We injected bladder compliance is impaired in Piezo2-knockout mice.
a viral vector carrying Cre recombinase into postnatal day 0 (P0) to P3 We next investigated individual micturition events to determine
pups carrying the Cre-dependent calcium indicator GCaMP6f and a whether the bladder pressure required to sustain micturition was
conditional Piezo2-knockout allele, Piezo2cKO (Piezo2fl/fl-GCaMP6f+/+; con- abnormal in these mice. We observed consistent pressure increases
trols were GCaMP6f+/+)13. Thus, Piezo2 was deleted anywhere GCaMP6f within and among wild-type mice (Fig. 3e, Extended Data Fig. 2d), but
was expressed. We observed rapid, robust responses in control sacral bladder pressure traces in Piezo2-knockout mice were highly variable
level 1 (S1) DRG neurons in response to manual, high-pressure blad- (Fig. 3f, Extended Data Fig. 2e). The knockout mice exhibited higher
der filling with saline (Fig. 2c), but these responses were markedly peak bladder pressures (Fig. 3g), and required significantly more pres-
attenuated in Piezo2cKO DRG cells (Fig. 2d). Notably, cells responding sure during contractions, suggesting that the detrusor muscle must
to low-pressure stimuli were completely absent in Piezo2-knockout work harder to accomplish micturition (Fig. 3h, Extended Data Fig. 2j).
DRG (Fig. 2e). Calcium traces for cells in wild-type DRG revealed graded We also assessed whether sensory input via PIEZO2 is important for
responses to pressure stimuli, with many cells responding to low and urethral reflexes, which sustain efficient urination. During bladder
high pressures, but Piezo2cKO cells were silent at low pressures. Piezo2cKO contractions in wild-type mice, there was coordinated engagement of
DRG also had fewer cells responding to bladder stretch (Fig. 2f–m), but the urethra muscle (Fig. 3i). This reliable urethral activity was markedly
normal numbers of cells responding to painful pinch (Extended Data attenuated in Piezo2-knockout mice (Fig. 3j, k, Extended Data Fig. 2k).
Fig. 1). This suggests that PIEZO2 is a key sensor of bladder stretch. Knockout urethra responses varied from silent or weak coordination to
Mechanically-evoked micturition reflexes coordinate the bladder inappropriately timed hyperactivity (Extended Data Fig. 2c). Hyperac-
detrusor muscle and the urethral sphincter muscles to mediate efficient tivity is a sign of detrusor–sphincter dyssynergia, a condition involving
urinary control, and are critical for efficient voiding4. We hypothe- uncoordinated communication between the muscle groups responsi-
sized that reflex responses rely on PIEZO2 function to provide feed- ble for urination. This indicates that the urethra was not receiving the
back control over bladder pressure and urethra activity. We therefore appropriate sensory input to govern its activity during micturition.
investigated micturition reflexes in mice lacking PIEZO2 in all caudal Mice lacking PIEZO2 also had more variable, larger void volumes, as
tissues. These Hoxb8Cre;Piezo2fl/fl mice express Cre recombinase in longer periods between contractions allow more bladder filling (Fig. 3l).
bladder-innervating DRG neurons14 and bladder urothelium (Extended Together, these data indicate that PIEZO2 sets stretch sensitivity in
Data Fig. 2a). We used cystometry with urethral electromyography the lower urinary tract and initiates appropriately timed reflexes that
to simultaneously monitor bladder pressure and sphincter activity. contribute to efficient urination.
With continual filling at 30 μl min−1, control mice initiated bladder Next, we tested whether urination behaviour is altered in
contractions at regular intervals, which are observed as pressure peaks Piezo2-knockout mice. We placed mice on filter paper for 4 h and

Article
a b
CTB Piezo2 Merge DAPI Krt20 Piezo2 Merge

c Wild type Knockout e DRG bladder-fill responses
d
High
No of. responding
30
(>20 mmHg)
20 Low
cells
(<20 mmHg)
10
0
f g WT KO
Pressure
Pressure
20 mmHg
20 mmHg
h i
300 s 300 s
Calcium avg.
Calcium avg.
(all cells)
(all cells)
10% ΔF/F
10% ΔF/F
j 20
k 20
Calcium
Calcium
% ΔF/F
% ΔF/F
traces
traces
3 cells
5 5
l 20
m 20
% ΔF/F
% ΔF/F
Calcium
Calcium
peak
peak
5 5
Fig. 2 | Piezo2 is expressed in the lower urinary tract, and sensory neurons bladder-filling stimuli in wild-type (WT) (n = 3 mice) and Piezo2 cKO (KO) (n = 4
require PIEZO2 to detect low-pressure bladder filling. a, DRG neurons were mice) DRGs. f, g, Example pressure trace from wild-type (f) and Piezo2 cKO (g)
retrogradely labelled using CTB (cyan, left) and fluorescent in situ DRG. Stimuli were interleaved during recording, but are shown sorted low to
hybridization (FISH) of DRGs with probes targeting Piezo2 (magenta, middle). high, hence the discontinuous line. Data below these graphs are sorted
Arrowheads point to Piezo2-expressing bladder neurons. Scale bar, 50 μm. The together with the respective pressure peaks. h, i, Average per cent change in
tracing experiment was repeated using three mice, n = 22–36 cells analysed per calcium fluorescence for all responding cells during the pressure peaks shown
mouse. b, FISH for Krt20 (green) and Piezo2 (magenta) in bladder. Arrowheads in f, g, respectively. j, k, Calcium traces for individual wild-type cells that
point to Piezo2-expressing umbrella cells. Scale bar, 50 μm. FISH was responded to pressure stimuli in f (n = 17) ( j) and for Piezo2 cKO cells responding
performed on three bladders, with two technical replications. Analysis was to pressure stimuli shown in g (n = 6) (k). Each cell’s responses are shown on the
performed on 80–117 nuclei per bladder. c, d, Image z-stack from GCaMP6f+/+ same horizontal line. Cells are sorted by cumulative response to the four
control mouse (c) and Piezo2 cKO mouse (d) S1 DRG during bladder filling. lowest-pressure stimuli. l, m, Maximum calcium response for the
e, Count of cells responding to low-pressure (black) and high-pressure (red) corresponding cells in j, k, 1 s after pressure peak.
imaged the resulting urination patterns with UV illumination. We used urinary reflexes in Piezo2-knockout mice lead to detrusor hypertrophy,
only female mice to preclude territorial scent-marking behaviour. an indicator of chronic voiding dysfunction.
Wild-type mice typically urinated in the corners and edges of the cage We next investigated the cell types in which PIEZO2 was required.
in large spots (Fig. 3m). Piezo2-knockout mice had a variety of urination We tested whether PIEZO2 deficiency in urothelial cells changed the
patterns, and some displayed urine leaking (small spots) or large voids pressure threshold that is required to initiate micturition. We used the
towards the cage centre (Fig. 3n, o). This phenotype was not attributed Upk2-cre allele to abrogate PIEZO2 activity in urothelial cells (Extended
to the knockout mice spending more time in the middle of the cage Data Fig. 3a), which have been proposed to act as stretch sensors and
(Fig. 3p). Thus, knockout mice have abnormal urination behaviour, communicate to underlying neurons using ATP7,9,22–24. We found that
including some apparent incontinence. Upk2-cre;Piezo2fl/fl knockout mice displayed similar phenotypes to the
We next studied whether this observed urinary dysfunction in Hoxb8-cre;Piezo2fl/fl knockout mice, with higher bladder stretch thresh-
Piezo2-knockout mice led to long-term consequences. Chronic uri- olds, increased bladder pressure during micturition and attenuated
nary tract dysfunction typically causes tissue remodelling as the urethral reflexes (Fig. 4a–h). In combination with expression data from
bladder wall grows thicker to compensate for inefficient voiding20,21. FISH (Fig. 2b), these data indicate that PIEZO2 acts in umbrella cells to
This remodelling can eventually result in ‘decompensation’, which is help set bladder-stretch sensitivity and initiate appropriate micturition
marked by a flaccid, ineffective bladder with sequelae of incomplete reflexes. These results confirm the proposed role for umbrella cells
voiding, vesicoureteral reflux and increased frequency of urinary tract as mechanosensory cells that participate in initiating micturition8.
infections. Bladder-wall thickening was observed by haematoxylin and We observed similar phenotypes in mice that lacked PIEZO2
eosin staining in Piezo2-deficient mice (Fig. 3q–s). The weight of freshly only in sensory neurons (Fig. 4i–p). Deleting PIEZO2 in all sensory
excised bladders also revealed bladder-wall remodelling, as bladders neurons is lethal, so we used Scn10a-cre mice25 (Scn10a encodes the
from the knockout mice were significantly heavier than those from voltage-gated sodium channel Nav1.8) to delete PIEZO2 in the Aδ- and
wild-type littermates (Fig. 3t, Extended Data Fig. 2m). Thus, impaired c-fibre subsets, which are the primary sensory neuron types described

a Wild type
b Knockout
c d
Contraction intervals Pre-contraction pressure
600 **** 25 ****
Interpeak interval (s)

20
Pressure (mmHg)
400
15
10
20 mmHg 200
5
0 0
100 s
WT KO WT KO
e Wild type f Knockout g h
25
Bladder pressure (mmHg)
Peak bladder pressures Bladder contractions
Contraction AUC (mmHg s)

40 **** 20 ****
Peak pressures (mmHg)

20
15 30 15
10 20 10
5 10 5
0
0 0
–20 –10 0 10 20 –20 –10 0 10 20
Time from peak (s) Time from peak (s) WT KO WT KO
i j k Urethra contractions l Void volume

15
2 × 106 **** 150 *
Urethra activity (μV)
Sum activity (μV)

10
Volume (μl)
100
1 × 106
5 50
0 0 0
–20 –10 0 10 20 –20 –10 0 10 20 WT KO WT KO
Time from peak (s) Time from peak (s)
m o q s t
Filter paper urination
Bladder muscle Bladder weight
***
Urine in centre (%)
80 * ***
Wild type
60 600 50
40 Muscle thickness (μm)
40
20
Weight (mg)
400
0 30
n p WT KO r
Time in centre (%)
40 20
Knockout
200
30
20 10
10
0 0
0 WT KO WT KO
WT KO
Fig. 3 | PIEZO2 is required for efficient micturition reflexes. a, b, Example Student’s t-tests with Welch’s correction. In c–l, n = 6 (wild-type) and n = 5
pressure traces from three female wild-type mice (a) and three female (Hoxb8-cre;Piezo2fl/fl) female mice; n = 10–29 bladder contractions analysed per
Hoxb8-cre;Piezo2fl/fl mice (KO) (b) during continuous bladder filling. mouse. m, n, Urination patterns of five wild-type (m) and five Hoxb8-cre;Piezo2fl/fl (n)
c, d, Bladder-contraction intervals (P < 0.0001) (c) and bladder pressure five mice. o, Quantification of urine in the middle 50% of the cage (P = 0.0001). n = 11
seconds before contraction peaks (P < 0.0001) (d). e, f, Heat maps showing female mice per group. p, Wild-type and Hoxb8-cre;Piezo2fl/fl mice spend similar
bladder contractions in wild-type (e) and Hoxb8-cre;Piezo2fl/fl (f) female mice. amounts of time in the cage centre. q, r, Haematoxylin and eosin staining from
Each row represents bladder pressure during a single micturition event, with wild-type (q) and Hoxb8-cre;Piezo2fl/fl (r) bladder sections, from 6- to 7-month-old
peaks aligned at 0. Arrowheads mark where data from one animal end and data littermates. Scale bars, 100 μm. The muscle layer is marked with vertical lines.
from another begin. g, h, Peak bladder pressures (P < 0.0001) (g) and area under s, t, Bladder muscle wall thickness (s; n = 5, P = 0.016) and total bladder weight
the curve (AUC) for bladder contractions (P < 0.0001) (h). i, j, Heat maps showing (t; n = 9 (wild type) and 8 (Hoxb8-cre;Piezo2fl/fl), P = 0.0002). In o, s, t, Mann–
urethra activity in wild-type (i) and Hoxb8-cre;Piezo2fl/fl (j) female mice from Whitney test; otherwise, two-sided Student’s t-tests with Welch’s correction.
e, f, with rows corresponding to bladder-contraction events in e, f. k, Urethra Data are mean ± s.d.
activity during micturition (P < 0.0001). l, Void volume measurements (P = 0.03).
in the bladder10,26. This mouse line does not induce recombination in mice, but not in Scn10a-cre;Piezo2fl/fl mice. Neuronal PIEZO2-knockout
urothelial cells (Extended Data Fig. 3b–e). Sensory-neuron-specific mice do require more bladder pressure for micturition and have highly
Piezo2-knockout mice displayed longer intervals between contrac- attenuated urethral reflex responses (Fig. 4o, p). These data implicate
tions (Fig. 4i), but the pressure before contractions was not different PIEZO2 in mediating neuronal stretch responses that are critical for
from that in wild-type mice, as it was in urothelial-specific- and full downstream urethral reflexes. Of note, mice with Piezo2 knockout in
caudal-knockout mice (Fig. 4j). This implies that mechanosensory individual tissues did not display the marked bladder remodelling that
stimuli activate PIEZO2 in umbrella cells to initiate bladder relaxation was observed in full caudal-knockout mice (Fig. 3s, t, Extended Data
during filling (Fig. 4b) and that neuronal mechanosensing is dispensa- Fig. 3f, g), suggesting that urothelial or neuronal PIEZO2 alone could
ble for this process. Alternatively, it is possible that bladders become still contribute to urinary function. These results indicate that there is a
fibrotic and less compliant in Upk2-cre;Piezo2fl/fl and Hoxb8-cre;Piezo2fl/fl two-part signalling mechanism involving PIEZO2 in umbrella cells and

Article
Upk2-cre;Piezo2fl/fl Scn10a-cre;Piezo2fl/fl
a Contraction intervals b Pre-contraction pressure i Contraction intervals j Pre-contraction pressure
600 **** 30 *** ** 30 NS
Pressure pre-peak (mmHg)

800
Pressure pre-peak (mmHg)
Interpeak interval (s)

Interpeak interval (s) 500
400 20 20
250
200 10 10
0 0 0 0
WT KO WT KO WT KO WT KO
c Wild type d Knockout k Wild type l Knockout

25 25
20 20
15 15
10 10
5 5
0 0
–20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20
Time from peak (s) Time from peak (s) Time from peak (s) Time from peak (s)
e f m n
15 15

10 10
5 5
0 0
–20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20–20 –10 0 10 20
Time from peak (s) Time from peak (s) Time from peak (s) Time from peak (s)
g Bladder contractions h Urethra contractions o Bladder contractions p Urethra contractions
**** * ** ****
15 20
Contraction AUC (mmHg)
Contraction AUC (mmHg)
2 × 106 2 × 106
Sum activity (μV)
Sum activity (μV)
15
10
1 × 106 10 1 × 106
5
5
0 0
0 0
WT KO WT KO WT KO WT KO
fl/fl
Fig. 4 | PIEZO2 functions in both bladder urothelium and sensory neurons. wild-type and Scn10a-cre;Piezo2 (KO) mice. n = 3 wild type and 3
a–h, Cystometry data from wild-type and Upk2-cre;Piezo2fl/fl (KO) mice. n = 5 Scn10a-cre;Piezo2 fl/fl female mice; 11–24 contractions per mouse. Cartoon in the
wild type and 4 Upk2-cre;Piezo2 fl/fl female mice; 18–49 bladder contractions top right depicts the lower urinary tract, with Piezo2 KO tissue in red.
analysed per mouse. Cartoon in the top right depicts the lower urinary tract, i, j, Intervals between bladder contractions (P = 0.002) (i) and bladder
with Piezo2 KO tissue in red. a, b, Intervals between bladder-contraction voids pressures five seconds before peak contraction ( j). k, Bladder pressure events
(P < 0.0001) (a) and bladder pressures five seconds before peak contraction during continuous filling cystometry in wild-type (k) and Scn10a-cre;Piezo2fl/fl (l)
(P = 0.001) (b). c, d, Bladder pressure events during continuous filling mice. m, n, Urethra activity recorded during the bladder contraction events
cystometry in wild-type (c) and Upk2-cre;Piezo2fl/fl (d) mice. e, f, Urethra activity shown in k, l. o, p, Bladder pressure during micturition events (P = 0.004) (o)
recorded during the bladder contraction events shown in c, d. g, h, Bladder and urethral reflex responses during micturition (P < 0.0001) (p). Data are
pressure during micturition events (P < 0.0001) (g) and urethral reflex mean ± s.d. Two-sided Student’s t-test with Welch’s correction. *P ≤ 0.05,
responses during micturition (P = 0.03) (h). i–p, Cystometry data from **P ≤ 0.01, ***P ≤ 0.001 and ****P ≤ 0.0001.
sensory neurons that set bladder sensitivity and promote micturition For example, the mechanotransduction ion channels TMEM63B and
reflexes. Further investigations are required to address how these cell PIEZO1 are widely expressed in the urothelium, and PIEZO1 partially
types communicate. mediates urothelial stretch responses in vitro27.
We have used evidence from mice and humans to identify the mecha- Our results suggest a two-part model of mechanosensory signalling
notransduction channel PIEZO2 as a critical mediator of urinary tract in the urinary tract, which is reminiscent of epithelial cell–neuronal
function. Absence of Piezo2 in mice does not result in urinary tract sensory machinery in the skin (Merkel cell–neurite complexes), lung
paralysis and death, and PIEZO2-deficient humans are still able to uri- (neuroepithelial bodies) and intestine (enterochromaffin cells)15,17,28,29.
nate. This indicates that there are mechanotransduction proteins other Our results also implicate umbrella cells in mediating bladder relaxa-
than PIEZO2 in the urothelium and lower urinary tract sensory neurons. tion during filling, perhaps by signalling to bladder muscle and/or

through stretch-induced cellular changes30. Future studies will address 13. Szczot, M. et al. PIEZO2 mediates injury-induced tactile pain in mice and humans.
Sci. Transl. Med. 10, (2018).
how urothelial cells and sensory neurons cooperate to control urinary 14. Woo, S. H. et al. Piezo2 is the principal mechanotransduction channel for proprioception.
function. Nat. Neurosci. 18, 1756–1762 (2015).
15. Woo, S. H. et al. Piezo2 is required for Merkel-cell mechanotransduction. Nature 509,
622–626 (2014).
16. Chesler, A. T. et al. The role of PIEZO2 in human mechanosensation. N. Engl. J. Med. 375,
Online content 1355–1364 (2016).
Any methods, additional references, Nature Research reporting sum- 17. Nonomura, K. et al. Piezo2 senses airway stretch and mediates lung inflation-induced
apnoea. Nature 541, 176–181 (2017).
maries, source data, extended data, supplementary information, 18. Zeng, W. Z. et al. PIEZOs mediate neuronal sensing of blood pressure and the
acknowledgements, peer review information; details of author con- baroreceptor reflex. Science 362, 464–467 (2018).
tributions and competing interests; and statements of data and code 19. Afshar, K., Mirbagheri, A., Scott, H. & MacNeily, A. E. Development of a symptom score for
dysfunctional elimination syndrome. J. Urol. 182, 1939–1944 (2009).
availability are available at https://doi.org/10.1038/s41586-020-2830-7. 20. Ehrhardt, A. et al. Urinary retention, incontinence, and dysregulation of muscarinic
receptors in male mice lacking Mras. PLoS ONE 10, e0141493 (2015).
21. Flum, A. S. et al. Testosterone modifies alterations to detrusor muscle after partial
1. de Groat, W. C. & Yoshimura, N. Afferent nerve regulation of bladder function in health
bladder outlet obstruction in juvenile mice. Front Pediatr. 5, 132 (2017).
and disease. Handb. Exp. Pharmacol. 194, 91–138 (2009).
22. Takezawa, K. et al. Authentic role of ATP signaling in micturition reflex. Sci. Rep. 6, 19585
2. Keller, J. A. et al. Voluntary urination control by brainstem neurons that relax the urethral
(2016).
sphincter. Nat. Neurosci. 21, 1229–1238 (2018).
23. Takezawa, K., Kondo, M., Nonomura, N. & Shimada, S. Urothelial ATP signaling: what is its
3. Hou, X. H. et al. Central control circuit for context-dependent micturition. Cell 167, 73–86
role in bladder sensation? Neurourol. Urodyn. 36, 966–972 (2017).
(2016).
24. Ferguson, D. R., Kennedy, I. & Burton, T. J. ATP is released from rabbit urinary bladder
4. Garry, R. C., Roberts, T. D. & Todd, J. K. Reflex responses of the external urethral sphincter
epithelial cells by hydrostatic pressure changes–a possible sensory mechanism?
of the cat to filling of the bladder. J. Physiol. 139, 13–14 (1957).
J. Physiol. 505, 503–511 (1997).
5. Cockayne, D. A. et al. Urinary bladder hyporeflexia and reduced pain-related behaviour in
25. Agarwal, N., Offermanns, S. & Kuner, R. Conditional gene deletion in primary nociceptive
P2X3-deficient mice. Nature 407, 1011–1015 (2000).
neurons of trigeminal ganglia and dorsal root ganglia. Genesis 38, 122–129 (2004).
6. Andersson, K. E., Gratzke, C. & Hedlund, P. The role of the transient receptor potential
26. Sengupta, J. N. & Gebhart, G. F. Mechanosensitive properties of pelvic nerve afferent
(TRP) superfamily of cation-selective channels in the management of the overactive
fibers innervating the urinary bladder of the rat. J. Neurophysiol. 72, 2420–2430 (1994).
bladder. BJU Int. 106, 1114–1127 (2010).
27. Miyamoto, T. et al. Functional role for Piezo1 in stretch-evoked Ca2+ influx and ATP release
7. Mochizuki, T. et al. The TRPV4 cation channel mediates stretch-evoked Ca2+ influx and
in urothelial cell cultures. J. Biol. Chem. 289, 16565–16575 (2014).
ATP release in primary urothelial cell cultures. J. Biol. Chem. 284, 21257–21264 (2009).
28. Maksimovic, S. et al. Epidermal Merkel cells are mechanosensory cells that tune
8. Merrill, L., Gonzalez, E. J., Girard, B. M. & Vizzard, M. A. Receptors, channels, and
mammalian touch receptors. Nature 509, 617–621 (2014).
signalling in the urothelial sensory system in the bladder. Nat. Rev. Urol. 13, 193–204
29. Alcaino, C. et al. A population of gut epithelial enterochromaffin cells is
(2016).
mechanosensitive and requires Piezo2 to convert force into serotonin release. Proc. Natl
9. Apodaca, G., Balestreire, E. & Birder, L. A. The uroepithelial-associated sensory web.
Acad. Sci. USA 115, E7632–E7641 (2018).
Kidney Int. 72, 1057–1064 (2007).
30. Wang, E. C. et al. ATP and purinergic receptor-dependent membrane traffic in bladder
10. Zagorodnyuk, V. P., Brookes, S. J., Spencer, N. J. & Gregory, S. Mechanotransduction and
umbrella cells. J. Clin. Invest. 115, 2412–2422 (2005).
chemosensitivity of two major classes of bladder afferents with endings in the vicinity to
the urothelium. J. Physiol. 587, 3523–3538 (2009).
11. Murthy, S. E. et al. The mechanosensitive ion channel Piezo2 mediates sensitivity to Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
mechanical pain in mice. Sci. Transl. Med. 10, (2018). published maps and institutional affiliations.
12. Ranade, S. S. et al. Piezo2 is the major transducer of mechanical forces for touch
sensation in mice. Nature 516, 121–125 (2014). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods
Fluorescent in situ hybridization
All experiments were performed within the protocols and guidelines Bladder and DRG tissues were removed immediately from eutha-
approved by the Institutional Animal Care and Use Committees of The nized animals, and flash frozen in liquid nitrogen. The protocol for
Scripps Research Institute in compliance with regulatory standards RNAscope Multiplex Fluorescent Reagent Kit V2 (ACDBio: 323100) was
established by the Association for Assessment and Accreditation of followed according to instructions for fresh frozen tissue. A gentler
Laboratory Animal Care International (AAALAC). protease (III) was applied for only 25 min to lessen CTB-fluorophore
degradation in DRG tissue and delicate bladder tissue. Probes for
Statistics Piezo2 (ACDBio: 400191 or 439971), Krt20 (ACDBio: 402301), Piezo1
Unless otherwise noted in the legends, groups were compared using (ACDBio:) or Tmem63b (ACDBio: 431531) were applied to detect tran-
two-tailed Student’s t-test with Welch’s correction, as groups did not script. Quantification of was performed in ImageJ using regions of
have equal variances. For comparisons of groups with an n less than 15, interest to define quantification area and then by measuring mean pixel
we used the non-parametric Mann–Whitney test to assess differences, intensity. Only DRG cells with nuclei visible were quantified to prevent
as we could not assess distributions. These tests were indicated in the double-counting of the same cells in different sections. Multinucleated
figure legends. No statistical test was used to pre-determine sample umbrella cell borders could not always be defined, so quantification
size. Instead, sample size was determined by animal availability and was done per nucleus, with a set region of interest (ROI) size used for all
previous studies in the field, which found these sample sizes sufficient nuclei (roughly twice the diameter of the nucleus). ROIs were centred
to detect deficits. on each nucleus and used to measure mean pixel intensity. Control area
intensity was measured from tissue background to define an intensity
Study design cut-off for negative cells.
We established exclusion criteria before collecting cystometry data:
data from the first 30 min of cystometry recording was not used because DRG calcium imaging
bladder muscle activity has often not stabilized. Moreover, animals Viral strategy is previously described13. Animals were anesthetized
that displayed bladder leaking during recording were excluded from with isoflurane and the bladder was exposed (described above). The
analysis, as leaking indicated a flawed seal and thus inaccurate filling bladder apex was opened and phlanged PE20 tubing was filled with
responses. To verify the reproducibility of experimental findings, we saline and inserted as a catheter (Stoelting: 51154) into the bladder, and
restricted the time of day that cystometry recordings were done (Zeit- tied off using silk suture (Fisher Scientific: NC9140103). A syringe and
geber 8–14) and we performed every experiment in a cohort of male lines filled with saline that were connected to the catheter was used to
and female mice to compare to their wild-type littermates. The order of fill the bladder and test for leaking. The abdomen was sewn shut with
recordings for different genotypes was randomized. The experimenter the catheter extending out. The animal was flipped to prone position,
was blind to genotype when possible. Hoxb8-cre+;Piezo2fl/fl knockout and the vertebral column was exposed. A microdrill was used to open a
mice have obvious motor impairments, so the experimenter was never window in the bone above the first sacral DRG. Epifluorescence imaging
blind to genotype for these groups. was performed using an upright microscope (FVMPE-RS, Olympus)
equipped with a 4×, 0.28–numerical aperture air objective. Illumination
Mice was provided with a 130-W halogen light source (U-HGLGPS, Olympus),
Mice were kept in standard housing with 12:12 h light:dark cycle set using a standard green excitation/emission filter cube. Images were
with lights on from 06:00–18:00, with room temperature kept around acquired using an ORCA-Flash 4.0 CMOS camera (Hamamatsu) at a
22 °C, with humidity between 30–80% (not controlled). Adult male and frame rate of 5 Hz using MetaMorph (Molecular Devices. Analysis was
female mice were used as indicated in the text. Age-matched knockout previously described13.
and wild-type littermates were tested at the same age in each cohort, but
ages tested ranged from 5–8 months for Hoxb8-cre;Piezofl/fl, 6–12 months Cystometry and electromyography
for Scn10a-cre and 4–6 months for Upk2-cre). The Hoxb8-cre;Piezofl/fl l Male and female mice were anesthetized by isoflurane (5% induction,
mouse line has been previously described 11. GCaMP6f+/+ mice 1–2% maintenance, Kent Scientific SomnoSuite) and the bladder was
(B6;129S-Gt(ROSA)26Sortm95.1(CAG-GCaMP6f )Hze/J, Jackson Laboratory: catheterized and connected to saline lines (described above). Tungsten
Ai95, 024105) were bred to Piezo2fl/fl mice as described previously13,31. electrodes were inserted directly into the urethral muscle (A-M systems:
Piezo2fl/fl mice were mated with Scn10a-cre mice25 or Upk2-cre mice 795500). Saline lines were connected to a pressure sensor (Biopac:
(B6(129)-Tg(Upk2-cre)1Rkl/WghJ, Jackson Laboratory: 029281) to cre- RX104A-MRI) which connected via pressure transducer (TSD104A)
ate sensory-neuron-specific and urothelial-specific Piezo2-knockout to an MP160 Biopac system amplifier (DA100C). Electromyography
animals, respectively. Each of these Cre lines was also crossed with electrodes were connected to a differential amplifier (EMG100C: gain
Ai9 mice (B6.Cg-Gt(ROSA)26Sortm9(CAG-tdTomato)Hze/J, Jackson Laboratory: 1,000, sample rate 10 kHz, low-pass filter 5 kHz, 60 Hz notch filter and
07909) to assess Cre expression. Genotyping was performed using 100 Hz high-pass filter). Bladder was continuously filled at 20 μl min−1
guidelines from Jackson Laboratory. using a syringe pump until regular urination cycles began. Data was
not collected until the mouse had been stably cycling (30–40 min after
Retrograde labelling of sensory neurons beginning of recording), at which point filling rate was increased to
Mice were anesthetized with isoflurane. Their lower abdomen was 30 μl min−1. Data were logged with Acqknowledge software (v.4.4.2)
shaved and hair was removed using Nair (Fisher Scientific: NC0132811), and processed in MATLAB (v.2018b).
and sterilized with ethanol and iodine, cleaned and a midline laparot-
omy was performed to expose the bladder. Three to five injections of Behavioural assays
1–2 μl of cholera toxin B–Alexa 488 (Fisher Scientific: C22841) were Female mice were placed in normal home cages that were bottom-
made in the bladder wall using a Hamilton syringe. Care was taken to less, set on filter paper (Fisher Scientific: 05-714-4) and left in a dark-
avoid puncturing through to the bladder lumen. The abdominal wall and ened room for 4 h. Mice did not have access to water to prevent the
skin were sutured separately, and mice were given subcutaneous water leaks from disturbing urine marks. Paper was imaged using a
flunixin (0.1 ml per g body weight) for post-operative care. We waited widefield camera (Logitech C930e) while illuminated by UV light.
three to five days before taking tissue to allow the dye to reach the Images were thresholded and converted to B&W binary in ImageJ
cell soma. (v.2.0.0-rc-49/1.51d), and total number of black pixels was counted.
A region of interest corresponding to the middle 50% of the image area time of the questionnaire. Parents assisted with information gathering
was used to count the number of black pixels in the middle of the cage. from their children.
Bladder histology Reporting summary

Bladders used for histology were the same bladders used for the bladder Further information on research design is available in the Nature
weight measurements. Wild-type and knockout littermate male blad- Research Reporting Summary linked to this paper.
ders were collected, opened and blotted on kimwipes before weighing.
After recording their freshly excised weight, they were fixed in 10%
Neutral Buffered Formalin for 24 h, and then stored in 70% ethanol. Data availability
Bladders were paraffin embedded, and central cross-sections were The raw data that support the findings of this study are available from
used for haematoxylin and eosin staining. Processing and staining was the corresponding authors upon reasonable request.
performed by Scripps Histology Core services. Muscle layer thickness
was measured in ImageJ at 7–11 different places along the muscle wall
per animal perpendicular to the muscle surface, and an average thick- Code availability
ness value is shown per mouse. Code for calcium imaging analysis is previously published13. MATLAB
(v.2018b) code used for cystometry analysis is available at https://
Clinical assessment github.com/PatapoutianLab/cystometry.
Twelve patients with PIEZO2 loss-of-function mutations from 11 families
(n = 4 males and 8 females, ranging in age from 4 to 43) were evalu- 31. Chen, T. W. et al. Ultrasensitive fluorescent proteins for imaging neuronal activity. Nature
499, 295–300 (2013).
ated at the National Institutes of Health (NIH) under research proto-
col approved by the Institutional Review Boards of National Institute
Acknowledgements We thank D. Barucha-Goebel, A. R. Foley and S. Donkervoort for help with
of Neurological Disorders and Stroke (NINDS, protocol 12-N-0095) the clinical assessments; G. Averion, C. Jones and K. Brooks for experimental assistance; S. Ma
between April 2015 and May 2020. Written informed consent and/or and S. Simpson-Dworschak for early work on the project; E. Lacefield for helpful discussions;
assent (for minor patients) was obtained from each participant in the and the Scripps Histology Core for sample preparation. This work was supported by the
Howard Hughes Medical Institute; the NIH grants R35 NS105067 to A.P., F32 DK121494 to
study. All of the patients had biparentally inherited bi-allelic homozy- K.L.M. and R01 NS108439 to L.S.; and the NIH Intramural Research Program funding from the
gous or compound heterozygous nonsense variants that are expected National Center for Complementary and Integrative Health (A.T.C.) and from the National
to result in a ‘null’ status for protein expression. Patients with PIEZO2 Institute of Neurological Disorders and Stroke (A.T.C. and C.G.B.).
loss-of-function either found us or were referred to our group through Author contributions K.L.M. designed and performed all mouse cystometry, behavioural
our network of international collaborators. Genotype information experiments and tissue histology, analysed data and, together with A.P., wrote the manuscript.
can be found in Supplementary Table 1, along with past treatments D.S., T.O., C.G.B. and A.T.C. designed and performed the human clinical assessments. Calcium
imaging and analysis was performed by N.G., K.L.M. and M.S. Retrograde labelling and FISH
and diagnoses. One patient, P10, carried a nonsense and a deleterious experiments were performed by K.L.M., A.M.C. and I.D. J.K. and L.T.S. contributed analytical
splice site variant in compound heterozygosity. Also as stated above, tools for data analysis, technical support and conceptual project design. C.G.B, A.T.C. and A.P.
all patients presented with a profound congenital ubiquitous lack of contributed to project design and supervision. All authors discussed results and contributed
to manuscript editing.
proprioception, vibration, and specific loss of touch discrimination
on glabrous skin. Detailed history, clinical evaluation and testing were Competing interests The authors declare no competing interests.
conducted including an in-depth review of urinary function, urologi-
cal history, review of previous evaluations and non-invasive blad- Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
der ultrasound. Patients were recruited from all over the world and 2830-7.
their age ranged between 5 to 43 years (see table). Four adult patients Correspondence and requests for materials should be addressed to A.T.C. or A.P.
Peer review information Nature thanks Eric Honoré, Jon Levine and Mark Nelson for their
(3 females and 1 male) provided their own history. None of the patients contribution to the peer review of this work.
were taking any medications that could affect urinary function at the Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | The bladder urothelium expresses multiple standard deviation of responses from genital pinch in WT and d, Piezo2 cKO DRG.
mechanosensitive proteins, and PIEZO2 is not required for sensory neuron e, Quantification of peak responses during pinch shown as percent of baseline
pinch responses. a, FISH in bladder tissue with probes against Krt20 (green) (each data point is one cell). n = 3 DRGs, 40 cells for WT, 4 DRGs and 69 cells for
and Piezo1 (white). DAPI in blue. b, FISH in bladder tissue with probes against Piezo2 cKO DRGs.
Krt20 (green) and Tmem63b (white). DAPI in blue. c, Z-projection of the
Article
Extended Data Fig. 2 | PIEZO2 is required for efficient micturition reflexes Note: 1,200 s was the length of one recording. These dots represent recording
in male mice. a, Hoxb8-cre;Ai9 bladder tissue, fixed, frozen and mounted to periods in which the animal had no successful urination events. j, Total bladder
show tdTomato (red) throughout the tissue, labelled with DAPI (blue). pressure for males and k, sum of urethra activity during bladder contractions.
Scale is 100 μm. Expression was evaluated in two mice. b, Example pressure n = 6 males per group. P < 0.0001 for graphs in h, i, j and k, two-sided Student’s
and urethra activity traces from three wild-type males and c, three Hoxb8- t-test with Welch’s correction. l, Body weights from a subset of mice whose
cre;Piezo2 fl/fl knockout male littermates. d, Heat map of individual bladder bladder weights are shown in Fig. 2t, and m, bladder weights from animals in l,
contraction events in wild-type and e, knockout male mice, with corresponding shown as a percentage of body weight. Red horizontal lines indicate means,
urethra activity below in f and g respectively. h, Bladder contraction intervals vertical red bars indicate +/− standard deviation (shown where possible).
for males. i, Bladder pressures five seconds before peak contraction for males.
a Upk2-cre;Ai9 b Scn10a-cre;Ai9
200 µm
c d e
50 µm
f Upk2-cre bladder weight g Scn10a-cre bladder weight

50 50
40 40
Weight (mg)
Weight (mg)
30 30
20 20
10 10
0 0
WT KO WT KO
Extended Data Fig. 3 | Upk2- and Scn10a-cre expression and bladder

weights. a, Upk2-cre;Ai9 bladder tissue fixed, frozen and mounted to show
tdTomato (red) throughout the urothelium, labelled with DAPI (blue).
Expression was evaluated in two mice. b, Scn10a-cre;Ai9 bladder tissue fixed,
frozen and mounted to show tdTomato (red) is not present. Expression was
evaluated in two mice. Thin cryosections made neuronal endings difficult to
visualize. Scale: 200 μm, applies to a and b. c, Scn10a-cre;Ai9 DRG tissue
showing tdTomato (red) in the majority of neurons, and d, a cell backlabelled
with CTB-Alexa 488 injected into bladder. e, Merge of c and d, DAPI in blue.
9/9 backlabelled bladder cells analysed from two mice were tdTomato positive.
f, Quantification of freshly excised bladder weights from four Upk2-cre;Piezo2fl/fl
knockout and wild-type littermates. Age-matched littermates were 10–11
months old, which could account for greater variability. g, Bladder weights
from age-matched Scn10a-cre;Piezo2 fl/fl knockout mice and wild-type
littermates, 7–8 months old. Red lines indicate mean values.
Corresponding author(s): Ardem Patapoutian and Alexander Chesler
Last updated by author(s): Jul 12, 2020
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection Acqknowledge software was used for data collection (version 4.4.2)
Data analysis ImageJ (version 2.0.0-rc-49/1.51d), and Acqknowledge software (version 4.4.2) were used, and versions added to the supplementary
information. Code availability statement in manuscript: "Code for calcium imaging analysis is previously published13. Matlab (R2018b) code
was used for cystometry analysis and is available at: https://github.com/PatapoutianLab/cystometry."
Data
April 2020

The raw data that support the findings of this study are available from the corresponding author upon reasonable request.
1

Sample size No statistical test was used to pre-determine sample size. Instead, sample size was determined by animal availability and previous studies in
the field, which found these sample sized sufficient to detect deficits during cystometry (Keller et al., 2018 PMID: 30104734) and histological
differences in remodeling and behavior (Everaerts and Zhen, 2010, PMID: 20956320)
Data exclusions We established exclusion criteria prior to collecting cystometry data: data from the first 30 minutes of cystometry recording was not used
because bladder muscle activity has often not stabilized. Moreover, animals that displayed bladder leaking during recording were excluded
from analysis, as leaking indicated a flawed seal and thus inaccurate filling responses.
Replication FISH experiments were independently replicated 2-3 times with the same results. Cystometry recordings were performed in independent
male and female cohorts, and results were replicated. To verify the reproducibility of experimental findings, we restricted the time of day that
cystometry recordings were done (Zeitgeber 8-14) and we performed every experiment in a cohort of male and female mice to compare to
their wildtype littermates.
Randomization The order of recordings for different genotypes was randomized. Beyond this, assigning animals to experimental groups is not relevant to this
study, as the groups are defined by genotype. Animals of different sexes were analyzed independently to remove this covariate.
Blinding The experimenter was blind to genotype when possible for all experiments. HoxB8Cre+;Piezo2f/f knockout mice have obvious motor
impairments, so it was impossible to keep the experimenter blind for these groups.


Antibodies ChIP-seq
Clinical data

Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research
Laboratory animals Mice were kept in standard housing with 12 h light/dark cycle set with lights on from 6 AM to 6 PM, with room temperature kept
around 72 degrees Fahrenheit, with humidity between 30-80 % (not controlled). Adult male and female mice were used as indicated
in the text. Age-matched knockout and wildtype littermates were tested at the same age in each cohort, but ages tested ranged from
5-12 months). The HoxB8Cre;Piezof/f mouse line has been previously described11. GCaMP6f+/+ mice (B6;129S-
Gt(ROSA)26Sortm95.1(CAG-GCaMP6f)Hze/J, Jackson Laboratory: Ai95, #024105) were bred to Piezo2f/f mice as described
previously13,35 . Piezo2f/f mice were mated with SNSCre mice29 or UPK IICre mice (B6(129)-Tg(Upk2-cre)1Rkl/WghJ , Jackson
April 2020
Laboratory: #029281) to create sensory-neuron specific and urothelial specific Piezo2 knockout animals, respectively. Each of these
Cre lines was also crossed with Ai9 mice (B6.Cg-Gt(ROSA)26Sortm9(CAG-tdTomato)Hze/J, Jackson Laboratory: # 07909) to assess Cre
expression.
Wild animals No wild animals were used in this study
Field-collected samples No field collected samples were used in this study.
2
Ethics oversight All experiments were performed within the protocols and guidelines approved by the Institutional Animal Care and Use Committees

of The Scripps Research Institute in compliance with regulatory standards established by the Association for Assessment and
Accreditation of Laboratory Animal Care International (AAALAC).

Policy information about studies involving human research participants
Population characteristics Twelve patients with PIEZO2 loss-of-function mutations from 11 families (N=4 males and 8 females, ranging in age from 4 to
43) were evaluated at the National Institutes of Health (NIH) under research protocol approved by the Institutional Review
Boards of National Institute of Neurological Disorders and Stroke (NINDS, protocol 12-N-0095) between April of 2015 and
May of 2020. Written informed consent and/or assent (for minor patients) was obtained from each participant in the study.
Genotype information can be found in Extended Data Table 1, along with past treatments and diagnoses.
Recruitment Patients were recruited on the basis of their biparentally inherited bi-allelic homozygous or compound heterozygous
nonsense variant mutations in the Piezo2 gene. Patients with PIEZO2 loss of function either found us, or were referred to our
group through our network of international collaborators. The nature of this group means that we are only analyzing patients
without functional Piezo2, which is the goal of the study.
Ethics oversight Research protocol approved by the Institutional Review Boards of National Institute of Neurological Disorders and Stroke
(NINDS, protocol 12-N-0095)
Clinical data
Policy information about clinical studies
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Clinical trial registration This is not a clinical trial, but approval was through: NINDS, protocol 12-N-0095
Study protocol NINDS, protocol 12-N-0095
Data collection Data was collected at the NIH between April of 2015 and May of 2020.
Outcomes No outcomes measured.
April 2020
3
Article
Chemico-genetic discovery of astrocytic

control of inhibition in vivo
https://doi.org/10.1038/s41586-020-2926-0 Tetsuya Takano1 ✉, John T. Wallace1, Katherine T. Baldwin1, Alicia M. Purkey1, Akiyoshi Uezu1,
Jamie L. Courtland2, Erik J. Soderblom1,3, Tomomi Shimogori4, Patricia F. Maness5,6,
Cagla Eroglu1,2 ✉ & Scott H. Soderling1,2 ✉
Accepted: 19 August 2020

Perisynaptic astrocytic processes are an integral part of central nervous system
Check for updates synapses1,2; however, the molecular mechanisms that govern astrocyte–synapse
adhesions and how astrocyte contacts control synapse formation and function are
largely unknown. Here we use an in vivo chemico-genetic approach that applies a
cell-surface fragment complementation strategy, Split-TurboID, and identify a
proteome that is enriched at astrocyte–neuron junctions in vivo, which includes
neuronal cell adhesion molecule (NRCAM). We find that NRCAM is expressed in
cortical astrocytes, localizes to perisynaptic contacts and is required to restrict
neuropil infiltration by astrocytic processes. Furthermore, we show that astrocytic
NRCAM interacts transcellularly with neuronal NRCAM coupled to gephyrin at
inhibitory postsynapses. Depletion of astrocytic NRCAM reduces numbers of
inhibitory synapses without altering glutamatergic synaptic density. Moreover, loss
of astrocytic NRCAM markedly decreases inhibitory synaptic function, with minor
effects on excitation. Thus, our results present a proteomic framework for how
astrocytes interface with neurons and reveal how astrocytes control GABAergic
synapse formation and function.
The majority of central nervous system (CNS) synapses are ensheathed (Extended Data Fig. 1a), the biotinylation activity of Split 1-TurboID
by tiny astrocytic processes1,2. These astrocytic contacts are an integral was higher than that of Split 2-TurboID (Extended Data Fig. 1b, lane 8).
functional compartment of the tripartite synapse, which is defined as We therefore used Split 1-TurboID for the remainder of this study, and
the combination of pre- and postsynaptic neuronal, and perisynaptic named the two molecules N-TurboID and C-TurboID. We also used a
astrocytic, processes3. At the synapse, astrocytes control basal synaptic GRAPHIC-tagged full-length TurboID construct, TurboID-surface, to
transmission, neuromodulation, ionic balance and neurotransmitter biotinylate astrocyte surface proteins (Extended Data Fig. 1a, bottom).
clearance4–8. Furthermore, astrocyte and synapse development are In astrocyte–neuron co-cultures, astrocytes expressing TurboID-
interdependent processes that are regulated by dynamic bidirectional surface under the control of the GfaABC1D promoter17 (Extended
intercellular communication via secreted factors and cell adhesion Data Fig. 1c) exhibited biotinylation activity along their mem-
molecules9–12. Historically, however, gaining molecular insights branes (Extended Data Fig. 1d). Moreover, the reconstituted activ-
into perisynaptic astrocyte–neuron signalling has been hampered ity of Split-TurboID was found only at contact sites between neurons
owing to the lack of biochemical methods for isolating this astrocytic and astrocytes (Extended Data Fig. 1c), but not when either of the
compartment. halves were expressed alone (Extended Data Fig. 1d). To investi-
To identify proteins at the extracellular clefts between astro- gate whether TurboID-surface or Split-TurboID biotinylates tripar-
cytes and neurons, we developed a chemico-genetic in vivo BioID tite synapses in these cultures, astrocytes were co-transduced with
(iBioID) approach, based on reconstituting the enzymatic activity of GfaABC1D-mCherry-CAAX to mark astrocyte membranes, and synapses
a proximity-biotinylating enzyme, TurboID13, at astrocyte–neuron were labelled by immunostaining with pre- and postsynaptic makers
junctions (Fig. 1a). Recent studies have shown that split biotinyla- (excitatory, VGLUT1 and HOMER 1; inhibitory, VGAT and gephyrin).
tion constructs could recover enzymatic activity when they were in Both constructs mediated biotinylation that overlapped with astro-
close proximity in the cell cytoplasm14,15. In this study, we used our cytic membranes and closely associated with excitatory and inhibi-
glycosylphosphatidylinositol-anchored reconstitution-activated tory synaptic markers (Extended Data Fig. 2a–d), demonstrating the
proteins highlight intercellular connections (GRAPHIC) strategy16 to functional reconstitution of TurboID transcellularly at perisynaptic
direct N- and C-terminal TurboID fragments to the extracellular sur- astrocyte–neuron junctions in vitro.
face of neurons and astrocytes (Fig. 1a, Extended Data Fig. 1a). Among To test their activity in vivo, the constructs were introduced into
the two Split-TurboID construct pairs that we tested in HEK 293T cells mouse brain astrocytes and/or neurons via retro-orbital injections
1
The Department of Cell Biology, Duke University Medical School, Durham, NC, USA. 2Department of Neurobiology, Duke University Medical School, Durham, NC, USA. 3Duke Proteomics and
Metabolomics Shared Resource and Duke Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, USA. 4Molecular Mechanisms of Brain Development,
Center for Brain Science (CBS), RIKEN, Saitama, Japan. 5Department of Biochemistry, University of North Carolina School of Medicine, Chapel Hill, NC, USA. 6Department of Biophysics,
University of North Carolina School of Medicine, Chapel Hill, NC, USA. ✉e-mail: tetsuya.takano@keio.jp; cagla.eroglu@duke.edu; scott.soderling@duke.edu

a b Cell-type-specific AAVs
AAV PHP.eB-hSynI-N-TurboID
Astrocyte Presynapse
ITR hSynI N-TurboID GPI WPRE pA ITR
Biotinylation
Split-TurboID AAV PHP.eB-GfaABC1D-C-TurboID TurboID-HA
ITR GfaABC1D C-TurboID GPI WPRE pA ITR
AAV injection Biotin injection

Postsynapse
Analysis
P21 P42 P48
hSynI-eGFP GfaABC1D-mCherry–CAAX
c (neuron) (astrocyte) Streptavidin Merge
Control
TurboID-surface
Split-TurboID 5 μm
d e f
Streptavidin colocalization
TurboID-surface Split-TurboID TurboID-surface Split-TurboID 100 P < 0.001
(% puncta per 100 μm2)

PSD95 Gephyrin
VGLUT1 VGAT 80 P < 0.05
Streptavidin Streptavidin 60 P < 0.001
VGLUT1
40 PSD95
STED
STED
P < 0.001 VGAT

20 Gephyrin
Other
0
ce
ID
bo
rfa
1 μm 1 μm
ur
su
-T
D-
lit
oI
Sp
rb
Tu
Fig. 1 | Identification of the astrocyte–neuron synaptic cleft proteome neuronal eGFP and astrocyte mCherry–CAAX. d, e, Three-colour STED images
using in vivo Split-TurboID. a, Schematic of the Split-surface iBioID showing biotinylated proteins adjacent to excitatory synaptic markers PSD95
approach. b, Outline of Split-TurboID method using cell-type-specific AAVs. and VGLUT1 (d), and inhibitory synaptic markers gephyrin and VGAT (e). f, The
ITR, inverted terminal repeats; hSyn1, human synapsin 1 promoter; GPI, ratio of biotinylated proteins that colocalized with VGLUT1, PSD95, VGAT or
glycosylphosphatidylinositol; WPRE, woodchuck hepatitis virus post- gephyrin. n = 15 cells per each condition from 3 mice. n = 3 biological repeats.
transcriptional regulatory element; pA, polyadenylation. c, Confocal images of Student’s paired t-test, comparing TurboID-surface and Split-TurboID. Data are
cortical expression of Split-TurboID or TurboID-surface coexpressed with mean ± s.e.m.
of adeno-associated viruses (AAVs)18 at postnatal day (P)21 (Fig. 1b, purified and analysed by quantitative high-resolution liquid chro-
Extended Data Fig. 3a–c) and the mice were given subcutaneous biotin matography–tandem mass spectrometry (LC–MS) (Fig. 2a). When
injections starting at P42 for 7 days (Fig. 1b)19. Biotinylated proteins were combined, the Split-TurboID and astrocyte-specific TurboID-surface
detected by immunoblotting and immunohistochemistry (Extended datasets identified 776,376 peptides corresponding to 3,171 distinct
Data Fig. 4a–d) both for astrocyte-specific TurboID-surface and recon- proteins (Extended Data Fig. 4g). After three independent experiments
stituted Split-TurboID constructs. However, when Split-TurboID frag- and following removal of known contaminants19, 173 and 178 proteins
ments were expressed alone, no biotinylation was observed (Extended were found to be significantly enriched (1.5. fold) in Split-TurboID and
Data Fig. 4a–c). These results show that the TurboID-surface and astrocyte-specific TurboID-surface fractions, respectively, compared
Split-TurboID constructs generate extracellular biotinylation in vivo. with soluble TurboID control (Extended Data Fig. 4g–i, Supplementary
To confirm that biotinylated proteins localize to neuron–astro- Tables 1, 2). This enrichment approach is stringent, and thus may not
cyte contacts, we labelled neurons with eGFP (using AAV PHP. identify all astrocytic proteins that are present at perisynaptic pro-
eB-hSynI-eGFP) and astrocyte membranes with mCherry–CAAX cesses, as it selects only those that are overrepresented at synapses
(using AAV PHP.eB-GfaBC1D-mCherry-CAAX) and co-injected either compared with other compartments.
astrocyte-specific TurboID-surface or Split-TurboID-expressing A total of 118 proteins were common between the two datasets, yield-
viruses. In both conditions, biotinylated proteins were located at the ing a high-confidence tri-partite synapse proteome (Fig. 2b, Extended
contacts between astrocytic and neuronal processes (Fig. 1c). Using Data Fig. 4g–i, Supplementary Table 3). This list includes known tripar-
super-resolution stimulated emission depletion (STED) microscopy, tite synapse proteins such as neuroligin-3 and neurexin-19, calcium chan-
we found that biotinylated proteins surround excitatory and inhibi- nel auxiliary subunits that also regulate glutamate receptor trafficking
tory synapses (Fig. 1d, e). More than 50% of TurboID-surface-induced (CACNA2D3, CACNG2 and CACNG3), excitatory synaptic proteins such
biotinylation and more than 90% of Split-TurboID-induced biotinylation as AMPA receptors (GRIA2 and GRIA3), and inhibitory synaptic pro-
was closely associated with synaptic markers (Fig. 1f). The densities of teins such as type A γ-aminobutyric acid (GABAA) receptors (GABRA1,
synapses were not affected by either labelling approach (Extended Data GABRA4, GABRB2 and GABRG2) (Fig. 2b). By cross-referencing our pro-
Fig. 4e, f). Together, these results show that the TurboID-surface and teomics data set with cell-type-specific gene-expression databases20,21,
Split-TurboID constructs effectively biotin-label perisynaptic contacts we found that messenger RNA for 33 of these proteins were enriched in
between astrocytes and neurons in vivo. astrocytes (RNA-sequencing expression ratio >1.0, diamonds in Fig. 2b),
76 were enriched in neurons (circles in Fig. 2b) and 5 proteins had equal
or unknown distribution (Fig. 2b). Bioinformatics analysis showed that
Perisynaptic cleft proteome discovery our high-confidence tripartite proteome contained known synaptic
To identify the tripartite synaptic proteins, proteins biotinylated by cleft proteins (29 proteins, 25%), cell adhesion proteins (18 proteins,
Split-TurboID or astrocyte-specific TurboID-surface constructs were 15%), channels (18 proteins, 15%), G-protein-coupled receptors and

Article
Lysate
a LC–MS/MS
Intensity
Streptavidin
beads m/z
AAV injection Biotin injection Mouse cortex Biotinylated proteins Data acquisition
Split-TurboID dissection purification
TurboID-surface
Syp
Previous synaptic cleft proteomics Cell adhesion molecules
b Thy1
Gjc3 Slc30a3
Tspan2 Cacng3
Serpina1a
Atp8a1 Ngly1 Sv2b
Ctsd Hapln4
Gria3 Hba-a1 Vdac3 Gstm7
Vdac2
Atp9a Tpt1
Cpe Ptk2b
Tenm1 Cst3 Cntnap2 Prxl2a
Syt7 Pdia3
Ppt1 Prdx4 Lynx1
Slc17a6 Gria2 Nrcam Cdipt
Nlgn3 Pten
Alb Cacng2
Nrxn1 Sacm1l
Apmap Clic4 Serpinb1a
Sord
C3 Gabrg2
Tm9sf2 Pzp Ppia
Scamp3 Channels and VGCCs Receptors
Ostc Tfrc Tenm2 Olfm1
Lrp1
C1qbp Adck1 Split- Hapln1 C1qb Hsp90b1
Cntfr TurboID
Split-TurboID and TurboID-surface
Lgi1 Lrba Gabra1 Hexb

Mal2 Slc8a2
Cntnap1 Lamp1 Sv2a Serpinb6a
Synj2bp
Btbd17 Ptprs Vcan
Tgfbrap1 Erlin2
Syngr1 Adam22 Lman2 Adam11
Lrpap1
Tmem106b
Prrt3 Lgi3 Slc2a3 Nomo1 Gabrb2
Car2 Tnr
Mesd Wdr1
Igsf8
Pcyox1l Prnp Itpr1 Cacna2d3
Tmed10 Tas2r116
Gabra4 Abcg2
Adcyap1r1
Eno2 Secreted and extracellular matrix Neurological disorders
Lsamp Lgalsl
Atp1b1 Slc5a7 Tenm4
Mag Slc17a7 Vdac1
Hcn2
Ntm Slc27a4 Slc8a1
Mbp Cpt1c
Prkacb
Excitatory synaptic proteins
Inhibitory synaptic proteins
Both excitatory and inhibitory synaptic proteins
Synaptic proteins without specificity information
Synaptic orphans
Fig. 2 | The astrocyte–neuron synaptic cleft proteome. a, Outline of enriched in neurons (mRNA expression in astrocyte/neuron <1) are in circles,
proteomic approach. b, Left, overlapping high-confidence proteins shared those for which gene expression is enriched in astrocytes (mRNA expression in
between the Split-TurboID and TurboID-surface enriched fractions. Right, astrocyte/neuron ≥1.0) are in diamonds. Edges are shaded according to the
clustergram topology of proteins in selected functional categories. Node titles type of interaction (grey, iBioID; black, previously reported protein–protein
show the corresponding gene symbols and node size represents log 2 fold interactions).
enrichment over negative control. Proteins for which gene expression is
associated proteins (4 proteins, 3%), other receptors and associated (measured by neuropil infiltration volume (NIV)) (Extended Data
proteins (16 proteins, 14%), secreted or extracellular matrix compo- Fig. 5h–k). By contrast, the deletion of NRCAM significantly increased
nents (34 proteins, 29%), and proteins encoded by genes implicated NIV (Extended Data Fig. 5j, k), indicating that NRCAM is a negative
in disorders, including autism spectrum disorder and schizophrenia regulator of astrocytic elaboration into the neuropil. Thus, we focused
(34 proteins, 29%) (Fig. 2b). on NRCAM for further analysis.
Adhesions between astrocytes and neurons have critical roles in
orchestrating the concurrent development of synapses and morpho-
genesis of astrocytes9,22. To identify regulators of this process, we NRCAM regulates astrocyte morphogenesis
selected teneurin-2 (TENM2), teneurin-4 (TENM4) and NRCAM as can- To confirm that endogenous NRCAM is labelled by Split-TurboID in vivo,
didate bridging molecules between astrocytes and neurons. To deplete we used STED imaging, which showed that NRCAM colocalizes with
target proteins in astrocytes, we used a CRISPR-based approach. We biotinylated proteins in vivo (Extended Data Fig. 6a). NRCAM has previ-
confirmed depletion of astrocytic NRCAM using this approach by ously been identified at contacts between axons and myelinating glia24,25
quantitative western blot analysis (Extended Data Fig. 5a). NRCAM and has been studied as a neuronal protein regulating dendritic spine
single guide RNA (sgRNA) in combination with astrocyte-specific pruning26,27 but not, to our knowledge, in astrocytes. Cell-type-specific
Cas9 significantly diminished the level of NRCAM protein in mixed transcriptome analysis shows that levels of mRNA encoding NRCAM
neuron–astrocyte cultures; this could be rescued by re-expression of are higher in astrocytes than in neurons or oligodendrocytes20,21. We
sgRNA-resistant human NRCAM in astrocytes (Extended Data Fig. 5b, c). confirmed NRCAM protein expression in cultured astrocytes by western
Next, we used this astrocyte-specific CRISPR-based approach in vivo to blot (Extended Data Fig. 6b). Next, we analysed NRCAM localization
rapidly gain preliminary data on candidate proteins23 (Extended Data in astrocytes in vivo by STED microscopy, observing that endogenous
Fig. 5d, e). We retro-orbitally injected AAVs containing sgRNA for each NRCAM puncta colocalized with astrocytic membranes (Extended
candidate gene together with Cre recombinase under the control of an Data Fig. 6c, d).
astrocyte-specific promoter (AAV PHP.eB-U6-sgRNA-GfaABC1D-Cre) NRCAM is known to function in part through a homophilic transcel-
into conditional Cas9 knock-in (KI) mice. Astrocyte-specific Cre lular interaction28. In agreement, when we injected neuron-specific and
expression was confirmed in vivo using a tdTomato Cre-reporter astrocyte-specific Nrcam-expressing viruses into P21 mice (Extended
line (Extended Data Fig. 5f, g). We used either a negative control virus Data Fig. 6e), we observed colocalization of sparsely expressed astro-
(AAV-empty sgRNA-GfaABC1D-Cre) or sgRNA virus against each cytic haemagglutinin-tagged NRCAM (NRCAM–HA) with neuronal
target gene along with astrocyte-specific mCherry–CAAX to quantify NRCAM–V5 (Extended Data Fig. 6f) by STED imaging at P42.
astrocyte morphology. NRCAM is also expressed during early postnatal development26,27.
Compared with controls, loss of TENM4 but not TENM2 in P42 Deletion of NRCAM from astrocytes during the first two weeks of devel-
mouse cortical astrocytes significantly decreased astrocyte territory opment significantly increased astrocytic territory size and enhanced
volume and the infiltration of fine astrocyte processes into the neuropil NIV when compared with controls (Extended Data Fig. 7a–g). These

mCherry–CAAX
a c Excitatory synapse (astrocyte process)
AAV-GfaABC1D-NRCAM-HA (astrocytic NRCAM) Inhibitory synapse
VGLUT1
NRCAM
VGAT
Analysis
e
Distance from VGLUT1 (nm)

200
P21 P42 NS
Ezrin NRCAM 150
b d /mCherry–CAAX/VGLUT1 /mCherry–CAAX/VGLUT1
100
Astrocytic NRCAM–HA
PSD95
50
VGLUT1
0
rin
AM
Ez
RC
0.5 μm
N
f 200
Distance from VGAT (nm)

STED
STED
5 μm 1 μm Ezrin NRCAM NS
/mCherry–CAAX/VGAT /mCherry–CAAX/VGAT 150
Astrocytic NRCAM–HA
Gephyrin
VGAT 100
50
rin
AM
5 μm 1 μm 0.5 μm
Ez
RC
N
g Excitatory mCherry–CAAX h i j
synapse (astrocyte process) Control NRCAM sgRNA
Inhibitory synapse 200 200
NS
VGLUT1 distance (nm)
PSD95 distance (nm)

NS
Astrocyte process–
VGLUT1 150 150
NRCAM
STED
100 100
PSD95 VGAT
50 50
Gephyrin 0.5 μm
0 0
mCherry–CAAX (astrocyte process)
A
A
l
sg l
tro
AM ntro
RN
RN
± NRCAM /PSD95/VGLUT1
on
sg
o
C
C
AM
RC
RC
N
N
NRCAM sgRNA NRCAM sgRNA
k Control NRCAM sgRNA + hNRCAM + neuroNRCAM sgRNA neuroNRCAM sgRNA
STED
1.0 μm
mCherry–CAAX (astrocyte process)/Gephyrin/VGAT
l P < 0.01 m P < 0.01

400 P < 0.01 300
gephyrin distance (nm)
VGAT distance (nm)
P < 0.01
300
200
NS NS
200
100
100
0 0
oN AM s M
oN AM s M
ne NR CA NR NA
ne NR CA NR NA
ur N + h sgR l
AM gR NA
AM gRNNA
A
A M tro
ur N + sgR l
A
A M ro
ur C M CA
ur C M CA
RN
sg A
sg A
RN
RN A on
RN A nt
RC s gR
N
RC s gR
sg RC Co
sg RC C
o R h
o R
N
N
AM
AM
ne
ne
RC
RC
+
+
N
Fig. 3 | NRCAM controls astrocyte-neuron contacts in vivo. a, Schematic of astrocyte NRCAM adjacent to excitatory synapses (h) or inhibitory
the visualization of astrocytic NRCAM in vivo. b, Three-colour STED images synapses (k). hNRCAM, human NRCAM; neuroNRCAM sgRNA, deletion of
demonstrating that astrocytic NRCAM are adjacent to excitatory synapses or neuronal NRCAM. i, j, l, m, Quantification of average distance between
inhibitory synapses. c, Schematic of astrocytic NRCAM distribution assay astrocytic process and excitatory synapses (i, j) or inhibitory synapses (l, m)
in vivo. d, Three-colour STED images showing mCherry–CAAX-positive (n = 30 puncta per condition from 3 brains). n = 3 biological repeats. In
NRCAM or ezrin adjacent to excitatory presynapses and inhibitory e, f, i, j, Student’s paired t-test. In l, m, one-way analysis of variance (ANOVA)
presynapses. e, f, Quantification of average distance between astrocytic with Dunnett’s multiple comparison. Data are mean ± s.e.m. NS, not significant.
NRCAM and VGLUT1 or VGAT (n = 30 puncta per each condition from 3 brains). Arrows in b, d, h, highlight examples of adjacent synaptic fluorescent signals in
g, Schematic of in vivo astrocytic process–neuronal synapses contact assay. the images.
h–m, Three-colour STED images of an astrocytic process following deletion of
phenotypes were rescued by coexpression of sgRNA-resistant NRCAM– comprising residues 620–1193 and lacking the immunoglobulin
HA in astrocytes (Extended Data Fig. 7b–g). NRCAM is a type I mem- domain; and NRCAM(ΔECD), comprising residues 1030–1193 and
brane protein with a modular extracellular domain architecture that lacking both immunoglobulin and fibronectin domains (Extended Data
is composed of repeated immunoglobulin and fibronectin domains Fig. 7b, c). Neither mutant rescued the morphology of NRCAM-deleted
(Extended Data Fig. 7b). To determine whether extracellular interac- astrocytes (Extended Data Fig. 7d–g), indicating that the extracel-
tions of NRCAM are required for astrocytic morphogenesis in vivo, lular interactions via immunoglobulin domains of NRCAM are nec-
we created two deletion mutants of human NRCAM: NRCAM(ΔIG), essary for maintaining the wild-type morphology. To test whether

Article
Input
a (brain lysate) IP c d NS
P < 0.0001 P < 0.0001
eGFP HA GABA A R Merge 5
AM
AM
GABAA R signal (AU)

RC
RC
Control
an G
G
4
N
Ig
Ig
ti-
ti-
(kDa)
trl
trl
an
3
C
150
NRCAM 10 μm
100 2
100
Gephyrin
NRCAM–HA
75 1
100
PSD95 75 0
sg A
RC R ol
M
A
150
N
tr
AM CA
RN
hy sgR
100
on
NRP2
C
N
neuroNRCAM
rin
NRCAM–HA
b HEK293T co-culture
ep
sgRNA
oN
G
ur
+
Inhibitory synapse NRCAM
ne
VGAT
NRCAM f
NRCAM–HA
P < 0.005
gephyrin
GABAA R
sgRNA
Gephyrin–VGAT colocalization
+
P < 0.001
HEK 293T cell 25
(puncta per 100 μm2)

Gephyrin NS
20
± NRCAM
± Gephyrin 15
e NRCAM sgRNA NRCAM sgRNA
Control NRCAM sgRNA + hNRCAM + neuroNRCAM sgRNA neuroNRCAM sgRNA 10
mCherry-CAAX/Gephyrin/VGAT
oN M sg M
l
ne NR CA NR A
A
RN AM tro
AM RN A
o R h N
ur CA M CA
RN
RC sg RN
sg A
on
ur N + gR
C
A s
sg RC
10 μm
AM N
ne
RC
+
N
2 μm
P < 0.01
mIPSC amplitude (pA)

15
g CRISPR–Cas9-mediated Recording h i
astrocytic NRCAM deletion Control 10
I
II/III 5
NRCAM sgRNA
0
IV
A
sg rol
RN
t
AM on
P42 Cas9 knock-in mice V
C
AAV-NRCAM sgRNA + Cre
RC
N
j k l m n Fast Slow
(< 2.8 ms) (> 2.8 ms)
4 P < 0.001
mIPSC frequency (Hz)
Cumulative frequency
1.5 1.5 1.5 20

Control Control mIPSC amplitude (pA)

Control P < 0.01 P = 0.08

NRCAM sgRNA NRCAM sgRNA NRCAM sgRNA
3 15
1.0 1.0 1.0
2 10
0.5 0.5 0.5
1 5
0 0 0 0 0
0 5 10 15 20 0 1 2 3 4 0 1 2 3 4 5
sg rol
A
A
l
sg l
RN
AM tro
AM ntro
RN
RN
t
mIPSC amplitude (pA) mIPSC inter-event

AM on
mIPSC rise time (ms)

on
sg
C
RC Co
interval (s)
C
RC
RC
N
Fig. 4 | Astrocytic NRCAM controls inhibitory synaptic organization and indicate examples of colocalizing synaptic markers. f, Average number of
function. a, Co-immunoprecipitation (IP) from cortical lysates of NRCAM with inhibitory synaptic colocalized puncta within astrocyte territories from
gephyrin, PSD95 and NRP2. b, Schematic of co-culture assay to identify effects cells as in e. n = 15 cells per each condition from 3 mice. g, Schematic of
of non-neuronal NRCAM–HA on inhibitory synaptic specializations. c, Images electrophysiology experiments in L2/3 pyramidal neurons of V1 cortex.
of NRCAM–HA coexpressed with eGFP in HEK 293T cells co-cultured with h, mIPSC traces from L2/3 pyramidal neurons following astrocyte treatment
neurons depleted of NRCAM or gephyrin. d, Mean integrated intensity of with control (empty sgRNA) or NRCAM sgRNA. i–m, mIPSC amplitude (i, j),
GABA A receptor in contact with transfected HEK 293T cells counted from frequency (k, l) and rise time (m) (n = 20 cells per condition from 4 mice).
cells as in c (number of cells: n = 418 control, n = 416 NRCAM–HA, n = 297 n, mIPSC amplitudes sorted by fast and slow rise times. In d, f, one-way
neuroNRCAM sgRNA + NRCAM, n = 356 gephyrin sgRNA + NRCAM). e, Images ANOVA with Dunnett’s multiple comparison; n = 3–6 biological repeats. In
of inhibitory synapses among NRCAM-deficient astrocytes. High i, k, n, Student’s paired t-test. Data are mean ± s.e.m.
magnification images (bottom) correspond to outlined areas (above), arrows
the transcellular homophilic binding between astrocytic and neu-

ronal NRCAMs is required for astrocyte morphogenesis, we targeted Astrocyte NRCAM modulates GABA synapses
neuronal NRCAM using AAV-NrCAM sgRNA-hSynI-Cre. Depletion Previous studies have shown that proper astrocyte morphogenesis is
of NRCAM from only neurons or from both astrocytes and neurons required for synaptic development, mediated through direct synaptic
enhanced astrocytic territory (at P14 but not at P42) and NIV (at both contact9. To determine whether astrocytic NRCAM is also important
P14 and P42) to a similar degree to astrocyte-specific NRCAM dele- for astrocyte–synapse contacts, we used STED microscopy to ana-
tion (Extended Data Fig. 7d–k). Together, these results indicate that lyse astrocyte-expressed NRCAM–HA with respect to excitatory and
homophilic binding between neuronal and astrocytic NRCAM restricts inhibitory synapses (Fig. 3a). We found that astrocytic NRCAM–HA
growth of astrocyte processes into the neuropil. This function of astro- closely associated with both synapse types (Fig. 3b). Then, to deter-
cytic NRCAM might be similar to its known role in promoting retraction mine whether endogenous astrocytic NRCAM is localized at tripartite
of dendritic spines via semaphorin–plexin signalling in neurons26. synaptic sites in vivo, we measured the distance from mCherry–
Notably, SEMA7A and PLXNA4 were also detected in our proteomic CAAX-positive NRCAM puncta to excitatory (VGLUT1+) and inhibitory
analysis (Fig. 2b). (VGAT+) presynapses (Fig. 3c) and compared this to localization of ezrin,

a protein known to be localized to perisynaptic astrocyte processes. rescued by the expression of human NRCAM (Fig. 4e, f). The deletion
The distances of NRCAM and ezrin puncta were a similar distance from of NRCAM from neurons alone or from both neurons and astrocytes
presynapses29,30 (Fig. 3d–f), demonstrating that astrocytic NRCAM is significantly decreased inhibitory synapse numbers (Fig. 4e, f). This
localized at astrocyte–synapse contacts in vivo. result further supports the idea that NRCAM bridges astrocytes and
To determine the effect of NRCAM loss on astrocyte–neuron contacts neurons via homophilic interactions to control inhibitory synapses.
in vivo, we measured the distance between mCherry–CAAX-labelled To determine the functional consequences of deleting NRCAM from
astrocytic process and excitatory or inhibitory synapses (Fig. 3g). Dele- astrocytes, we performed whole-cell patch-clamp recordings of min-
tion of astrocytic NRCAM did not alter the distance between astrocytic iature excitatory postsynaptic currents (mEPSCs) and inhibitory post-
processes and excitatory pre- or postsynapses (Fig. 3h–j). However, synaptic currents (mIPSCs) of pyramidal neurons in layer 2/3 (Fig. 4g).
the distance of astrocytic processes from inhibitory pre- and postsyn- The amplitude of mEPSCs was slightly decreased by the deletion of
apses was significantly increased (Fig. 3k–m). Furthermore, simultane- astrocytic NRCAM, but the frequency was not altered (Extended Data
ous deletion of both astrocytic and neuronal NRCAM, or of neuronal Fig. 9c–g). By contrast, both amplitude and frequency of mIPSCs were
NRCAM alone, similarly disrupted contacts between astrocytes and significantly decreased following NRCAM deletion compared with of
inhibitory synapses (Fig. 3k–m). controls (Fig. 4h–l). Inhibitory synapses that develop into pyramidal
The impairment of astrocyte–inhibitory synapse contacts due to neurons are established by a heterologous population of interneurons,
loss of astrocytic NRCAM was rescued by expression of human NRCAM targeting either perisomatic or distal dendritic regions34. Owing to
(Fig. 3k–m). This effect appeared to be directly related to NRCAM deple- their juxtaposition to the recording electrode, somatic mIPSC events
tion as the levels of other proteins implicated in astrocyte–neuron have much steeper rise kinetics than distal dendritic events and thus
interactions were unaffected (Extended Data Fig. 8a, b). Together, these can distinguish between these two populations35,36. Notably, we saw an
results strongly support a model in which homophilic NRCAM inter- increase in the rise time of mIPSCs (Fig. 4m), which when separated by
actions between astrocytes and neurons mediate adhesions between fast and slow events (fast being less than 2.8 ms and slow being over
astrocytes and inhibitory synapses. 2.8 ms)36, showed a significant decrease of mIPSC amplitudes for the
In previous studies, we identified neuronal NRCAM in the pro- fast (somatic) (Fig. 4n) compared to slower (dendritic) rise time events.
teome of GABAergic postsynapses using iBioID with the inhibi- Thus, astrocytic NRCAM is probably important for proper somatic
tory synapse organizer gephyrin as the bait19. Indeed, NRCAM inhibitory synaptic development and function in vivo. It will be interest-
co-immunoprecipitated with GFP–gephyrin when coexpressed in ing to analyse how these effects modulate GABAergic networks, such
HEK 293T cells (Extended Data Fig. 8c), and endogenous NRCAM as during visual cortical critical periods, in future studies.
co-immunoprecipitated with gephyrin from brain lysate (Fig. 4a). The Dissection of the in vivo chemical-affinity codes that organize
positive controls PSD95 and neuropilin-2 (NRP2)26 were also detected in the wiring of the brain in a cell-type-specific manner from tissue has
these co-immunoprecipitations, whereas negative control IgG did not remained a considerable challenge. In this study, we have developed an
precipitate gephyrin or positive-control proteins (Fig. 4a). These results in vivo BioID approach for discovery of extracellular cell–cell contact
indicate that NRCAM forms a complex with the neuronal GABAergic proteomes (Extended Data Fig. 10, top). Our Split-TurboID approach
synaptic scaffolding protein gephyrin, and thus it may have a critical differs from analogous methods in two ways: it can specify labelling
role in inhibitory synapse development in vivo. of junctions between two genetically defined cell types, and it can be
To test whether NRCAM functions as an organizer for inhibitory syn- applied in vivo. Previously, synaptic cleft proteomics studies have been
aptic specializations, we used an in vitro HEK 293T–neuron co-culture performed in vitro with split horseradish peroxidase-conjugated with
assay31–33. In this assay, consistent with previous studies, expression neurexin and neuroligin37,38 or with the synaptic adhesion molecule
of NL2 in HEK 293T cells induced ectopic formation of excitatory SynCAM139. Both approaches have been highly successful, identifying
(VGLUT1+) and inhibitory (VGAT+) presynapses32,33 (Extended Data excitatory and inhibitory synaptic cleft proteins in cultured cortical
Fig. 8e–l). Similarly, expression of presynaptic neurexin-1β (NRX1β), neurons. HRP-based labelling has the advantage of labelling synaptic
induced excitatory (HOMER1+) and inhibitory (GABAA receptor-positive) clefts on a minute timescale in cultured cells or ex vivo37,38,40. However,
postsynapses around the HEK 293T cells31,33 (Extended Data Fig. 8e–l). a concern with this method is that the labelling requires H2O2, which
When NRCAM, NRCAM(ΔIG) or NRCAM(ΔECD) were expressed in HEK is cytotoxic and difficult to use in living brain tissue while maintain-
293T cells co-cultured with neurons31–33 (Extended Data Fig. 8b–l), the ing complex multicellular interactions of the neuropil. We designed
expression of NRCAM, but not the mutant NRCAMs, induced ectopic TurboID-surface and Split-TurboID to overcome this issue. A different
formation of inhibitory pre- and postsynaptic contacts (Extended Data version of split-TurboID has been described recently for intracellular
Fig. 8e–h). NRCAM did not recruit excitatory synaptic specializations labelling between endoplasmic reticulum and mitochondria41. It will be
onto HEK 293T cells (Extended Data Fig. 8i–l). Of note, when NRCAM interesting to test how this version performs when displayed extracel-
or gephyrin were deleted from neurons using specific sgRNAs (Fig. 4b), lularly between cell types.
the ability of NRCAM-expressing HEK 293T cells to promote clustering Astrocytes have been proposed to control inhibitory synapse forma-
of inhibitory post-synapses was abolished (Fig. 4c, d). Together, these tion via secreted proteins42,43; however, the presence of adhesion-based
data indicate that transcellular homophilic NRCAM interactions control mechanisms through which astrocyte contacts control inhibitory
the organization of inhibitory synaptic specializations via neuronal synaptogenesis remain largely unknown. In this study we show that
gephyrin. astrocytic and neuronal NRCAMs bridge these two cell types to foster
inhibitory postsynaptic specializations via gephyrin. We propose that
these postsynaptic specializations then recruit presynaptic neuronal
NRCAM controls inhibition in vivo partners, to direct the formation of tripartite inhibitory synapses
Next, we examined the requirement of astrocytic NRCAM for excitatory (Extended Data Fig. 10, bottom). Loss of perisynaptic NRCAM interac-
or inhibitory synaptic structure and function in the mouse visual cortex. tions results in significant deficits of GABAergic transmission, with
When we quantified the intracortical synapses of layer 2/3 neurons that slight reductions in the amplitudes of glutamatergic responses. These
are abundant in layer 1, we found that deletion of astrocytic NRCAM reduced glutamatergic responses may be due to a well-documented
did not alter excitatory synapse number (Extended Data Fig. 9a, b). By homeostatic response to reduced inhibition44,45, given the lack of
contrast, deletion of astrocytic NRCAM significantly decreased inhibi- effects of NRCAM on excitatory synapse formation in co-culture and
tory synapses in layer 2/3 of the mouse visual cortex (Fig. 4e, f). The depletion assays. Thus, our proteomic analysis reveals both a mecha-
effect of astrocytic NRCAM deletion on inhibitory synapse number was nism for how astrocytes modulate inhibitory synapses, and a protein

Article
map to provide a basis for future studies of astrocyte–neuron signal- 22. Sakers, K. & Eroglu, C. Control of neural development and function by glial neuroligins.
Curr. Opin. Neurobiol. 57, 163–170 (2019).
ling at synapses. 23. Incontro, S., Asensio, C. S., Edwards, R. H. & Nicoll, R. A. Efficient, complete deletion of
synaptic proteins using CRISPR. Neuron 83, 1051–1057 (2014).
24. Custer, A. W. et al. The role of the ankyrin-binding protein NrCAM in node of Ranvier
Online content formation. J. Neurosci. 23, 10032–10039 (2003).
25. Feinberg, K. et al. A glial signal consisting of gliomedin and NrCAM clusters axonal Na+
Any methods, additional references, Nature Research reporting sum- channels during the formation of nodes of Ranvier. Neuron 65, 490–502 (2010).
maries, source data, extended data, supplementary information, 26. Demyanenko, G. P. et al. Neural cell adhesion molecule NrCAM regulates Semaphorin
3F-induced dendritic spine remodeling. J. Neurosci. 34, 11274–11287 (2014).
acknowledgements, peer review information; details of author con- 27. Mohan, V. et al. Temporal regulation of dendritic spines through NrCAM-Semaphorin3F
tributions and competing interests; and statements of data and code receptor signaling in developing cortical pyramidal neurons. Cereb. Cortex 29, 963–977
availability are available at https://doi.org/10.1038/s41586-020-2926-0. (2019).
28. Mauro, V. P., Krushel, L. A., Cunningham, B. A. & Edelman, G. M. Homophilic and
heterophilic binding activities of Nr-CAM, a nervous system cell adhesion molecule.
1. Yu, X., Nagai, J. & Khakh, B. S. Improved tools to study astrocytes. Nat. Rev. Neurosci. 21, J. Cell Biol. 119, 191–202 (1992).
121–138 (2020). 29. Derouiche, A., Anlauf, E., Aumann, G., Mühlstädt, B. & Lavialle, M. Anatomical aspects of
2. Lanjakornsiripan, D. et al. Layer-specific morphological and molecular differences in glia-synapse interaction: the perisynaptic glial sheath consists of a specialized astrocyte
neocortical astrocytes and their dependence on neuronal layers. Nat. Commun. 9, 1623 compartment. J. Physiol. Paris 96, 177–182 (2002).
(2018). 30. Lavialle, M. et al. Structural plasticity of perisynaptic astrocyte processes involves ezrin
3. Araque, A., Parpura, V., Sanzgiri, R. P. & Haydon, P. G. Tripartite synapses: glia, the and metabotropic glutamate receptors. Proc. Natl Acad. Sci. USA 108, 12915–12919 (2011).
unacknowledged partner. Trends Neurosci. 22, 208–215 (1999). 31. Scheiffele, P., Fan, J., Choih, J., Fetter, R. & Serafini, T. Neuroligin expressed in
4. Khakh, B. S. & Sofroniew, M. V. Diversity of astrocyte functions and phenotypes in neural nonneuronal cells triggers presynaptic development in contacting axons. Cell 101,
circuits. Nat. Neurosci. 18, 942–952 (2015). 657–669 (2000).
5. Ma, Z., Stork, T., Bergles, D. E. & Freeman, M. R. Neuromodulators signal through 32. Graf, E. R., Zhang, X., Jin, S. X., Linhoff, M. W. & Craig, A. M. Neurexins induce
astrocytes to alter neural circuit activity and behaviour. Nature 539, 428–432 (2016). differentiation of GABA and glutamate postsynaptic specializations via neuroligins. Cell
6. Papouin, T., Dunphy, J., Tolman, M., Foley, J. C. & Haydon, P. G. Astrocytic control of 119, 1013–1026 (2004).
synaptic function. Phil. Trans. R. Soc. Lond. B 372, 20160154 (2017). 33. Chih, B., Gollan, L. & Scheiffele, P. Alternative splicing controls selective trans-synaptic
7. Panatier, A. et al. Astrocytes are endogenous regulators of basal transmission at central interactions of the neuroligin–neurexin complex. Neuron 51, 171–178 (2006).
synapses. Cell 146, 785–798 (2011). 34. Tremblay, R., Lee, S. & Rudy, B. GABAergic interneurons in the neocortex: from cellular
8. Araque, A. et al. Gliotransmitters travel in time and space. Neuron 81, 728–739 (2014). properties to circuits. Neuron 91, 260–292 (2016).
9. Stogsdill, J. A. et al. Astrocytic neuroligins control astrocyte morphogenesis and 35. Miles, R., Tóth, K., Gulyás, A. I., Hájos, N. & Freund, T. F. Differences between somatic and
synaptogenesis. Nature 551, 192–197 (2017). dendritic inhibition in the hippocampus. Neuron 16, 815–823 (1996).
10. Stork, T., Sheehan, A., Tasdemir-Yilmaz, O. E. & Freeman, M. R. Neuron–glia interactions 36. Wierenga, C. J. & Wadman, W. J. Miniature inhibitory postsynaptic currents in CA1
through the Heartless FGF receptor signaling pathway mediate morphogenesis of pyramidal neurons after kindling epileptogenesis. J. Neurophysiol. 82, 1352–1362 (1999).
Drosophila astrocytes. Neuron 83, 388–403 (2014). 37. Martell, J. D. et al. A split horseradish peroxidase for the detection of intercellular
11. Sloan, S. A. & Barres, B. A. Mechanisms of astrocyte development and their protein-protein interactions and sensitive visualization of synapses. Nat. Biotechnol. 34,
contributions to neurodevelopmental disorders. Curr. Opin. Neurobiol. 27, 75–81 774–780 (2016).
(2014). 38. Loh, K. H. et al. Proteomic analysis of unbounded cellular compartments: synaptic clefts.
12. Allen, N. J. & Lyons, D. A. Glia as architects of central nervous system formation and Cell 166, 1295–1307 (2016).
function. Science 362, 181–185 (2018). 39. Cijsouw, T. et al. Mapping the proteome of the synaptic cleft through proximity labeling
13. Branon, T. C. et al. Efficient proximity labeling in living cells and organisms with TurboID. reveals new cleft proteins. Proteomes 6, E48 (2018).
Nat. Biotechnol. 36, 880–887 (2018). 40. Li, J. et al. Cell-surface proteomic profiling in the fly brain uncovers wiring regulators. Cell
14. Schopp, I. M. et al. Split-BioID a conditional proteomics approach to monitor the 180, 373–386 (2020).
composition of spatiotemporally defined protein complexes. Nat. Commun. 8, 15690 41. Cho, K. F. et al. Split-TurboID enables contact-dependent proximity labeling in cells. Proc.
(2017). Natl Acad. Sci. USA 117, 12143–12154 (2020).
15. De Munter, S. et al. Split-BioID: a proximity biotinylation assay for dimerization-dependent 42. Elmariah, S. B., Oh, E. J., Hughes, E. G. & Balice-Gordon, R. J. Astrocytes regulate
protein interactions. FEBS Lett. 591, 415–424 (2017). inhibitory synapse formation via Trk-mediated modulation of postsynaptic GABAA
16. Kinoshita, N. et al. Genetically encoded fluorescent indicator GRAPHIC delineates receptors. J. Neurosci. 25, 3638–3650 (2005).
intercellular connections. iScience 15, 28–38 (2019). 43. Hughes, E. G., Elmariah, S. B. & Balice-Gordon, R. J. Astrocyte secreted proteins
17. Lee, Y., Messing, A., Su, M. & Brenner, M. GFAP promoter elements required for selectively increase hippocampal GABAergic axon length, branching, and
region-specific and astrocyte-specific expression. Glia 56, 481–493 (2008). synaptogenesis. Mol. Cell. Neurosci. 43, 136–145 (2010).
18. Chan, K. Y. et al. Engineered AAVs for efficient noninvasive gene delivery to the central 44. Turrigiano, G. G., Leslie, K. R., Desai, N. S., Rutherford, L. C. & Nelson, S. B.
and peripheral nervous systems. Nat. Neurosci. 20, 1172–1179 (2017). Activity-dependent scaling of quantal amplitude in neocortical neurons. Nature 391,
19. Uezu, A. et al. Identification of an elaborate complex mediating postsynaptic inhibition. 892–896 (1998).
Science 353, 1123–1129 (2016). 45. O’Brien, R. J. et al. Activity-dependent modulation of synaptic AMPA receptor
20. Zhang, Y. et al. An RNA-sequencing transcriptome and splicing database of glia, accumulation. Neuron 21, 1067–1078 (1998).
neurons, and vascular cells of the cerebral cortex. J. Neurosci. 34, 11929–11947
(2014). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
21. Zhang, Y. et al. Purification and characterization of progenitor and mature human published maps and institutional affiliations.
astrocytes reveals transcriptional and functional differences with mouse. Neuron 89,
37–53 (2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods (ThermoFisher, A-11034), Alexa Fluor 488 Goat anti-Guinea pig (Ther-
moFisher, A11073), Alexa Fluor 488 Goat anti-Chicken (ThermoFisher,
No statistical methods were used to predetermine sample size. The A-11006), Oregon Green 488 Goat anti-Rabbit (ThermoFisher, O-11038),
experiments were not randomized. The investigators were blinded to Alexa Fluor 555 Goat anti-Rabbit (ThermoFisher, A21428), Alexa Fluor
allocation during experiments and outcome assessment. 568 Goat anti-Rat (ThermoFisher, A-11077), Alexa Fluor 594 Streptavidin
(ThermoFisher, S11227), Alexa Fluor 647 Donkey anti-rabbit (Ther-
Animals moFisher, A31573), Alexa Fluor 647 Goat anti-Chicken (ThermoFisher,
All mice were housed (2–5 mice per cage) at the Division of Laboratory A-21449), Alexa Fluor 647 Donkey anti-Guinea pig ( Jackson ImmunoRe-
Animal Resources facilities at Duke University. All procedures were con- search, 706-605-148), Alexa Fluor 647 Streptavidin (ThermoFisher,
ducted with a protocol approved by the Duke University Institutional S21374), Atto647N anti-Mouse (Sigma, 50185), Atto647N anti-rabbit
Animal Care and Use Committee in accordance with US National Insti- (Sigma, 40839), Donkey anti-Goat IRDye 800CW (LI-COR, 926-32214),
tutes of Health guidelines. All mice were kept under typical day:night Goat anti-rat IRDye 800CW (LI-COR, 925-32219) and Goat anti-Mouse
conditions of 12 h cycles. CD1 (022, Charles River), Cas9 (028239) and IRDye 680RD (LI-COR, 925-6818).
Ai14 (007914) mice were purchased from Jackson laboratory. Both
males and females were used, ages ranged from P0 to P42. AAV production
AAVs were produced as previously described19,46. In brief, HEK 293T cells
Plasmid construction (obtained from ATCC, CRL-11268; short tandem repeat confirmed and
pZac2.1-GfaABC1D-Lck-GCaMP6f was a gift from B. Khakh (UCLA) mycoplasma negative) were transfected with pAd-DELTA F6, serotype
(Addgene plasmid #52924). pcDNA3-V5-TurboID-NES was a gift plasmid AAV PHP.eB and AAV plasmid. After 72 h, the cells were lysed
from A. Ting (Stanford) (Addgene plasmid #107169). GRAPHIC was in 15 mM NaCl, 5 mM Tris-HCl, pH 8.5, and incubated with 50 U ml−1
obtained as previously described16. TurboID was subcloned into benzonase for 30 min at 37 °C. The cell lysate was then centrifuged
pZac2.1-GfaABC1D vector. The split sites of Split 1-TurboID and at 4,500 rpm for 30 min at 4 °C, and the supernatant containing AAV
Split 2-TurboID were at the 256/257 and 140/141 amino acid posi- was added to the top of an iodixanol gradient (15%, 25%, 40% and 60%
tion, respectively. The N-TurboID and C-TurboID fragments were iodixanol solution, top to bottom) and centrifuged using a Beckman
subcloned into AAV-hSynI and pZac2.1-GfaABC1D vector, respec- Ti-70 rotor, spun at 67,000 rpm for 1 h. The viral solution extracted
tively. AAV-hSynI-EGFP and pZac2.1-GfaABC1D-mCherry-CAAX were from the virus layer (between the 40% and 60% iodixanol layers) with
previously described9,19. GfaABC1D was amplified and subcloned a 24-gauge needle and 5-ml syringe, and concentrated with a 100-kDa
into AAV-U6-sg-Cre vectors. The sgRNA sequences used were as filter. Viral titres were measured by quantitative PCR using a linearized
follows: Tenm2, 5′- ATCTGGAATAATGGATGTAAAGG-3′; Tenm4, genome plasmid as a standard47. For small-scale AAV supernatant, HEK
5′- GCCAGAGGCCATGGACGTGAAGG-3′; Nrcam, 5′- GTGCCAGA 293T cells were transfected pAd-DELTA F6, serotype plasmid AAV PHP.
TGATCAGCGCGCTGG-3′. The gephyrin sgRNA was obtained as previ- eB or AAV2/1 and AAV plasmid. After 72 h, the AAV-containing super-
ously described19. The cDNA encoding human NRCAM (Gene ID4897, natant medium was collected and filtered with a 0.45-μm cellulose
Dharmacon) was amplified and subcloned into AAV-Ef1α, AAV-hSynI, acetate Spin-X centrifuge tube filter (Costar 8162).
pZac2.1-GfaABC1D vector. The fragments encoding human NRCAM
mutants (hNrCAM-ΔIg and hNrCAM-ΔECD) were subcloned into Primary neuronal, astrocytic and HEK 293T cell cultures
AAV-Ef1α and pZac2.1-GfaABC1D vector. pCAG-HA-Nrxn1beta AS4(-) Cortical neurons and astrocytes were prepared from P1 mouse
and pNICE-NL2(-) were gifts from P. Scheiffele (University of Basel) pups. These cells were seeded on coverslips or dishes coated with
(Addgene plasmid #59409 and #15246, respectively). pEGFP-gephyrin poly-l-lysine (Sigma) and cultured in neurobasal medium A (Invitrogen)
and pcDNA-PSD95-GFP were previously described19. All constructs supplemented with B-27 (Invitrogen) and 1 mM GlutaMAX (Invitrogen).
were confirmed by DNA sequencing. All primers are shown in Sup- Mouse cortical astrocytes were prepared as previously described9.
plementary Table 4. P0-3 mouse cortices were microdissected and papain digested fol-
lowed by trituration in low and high ovomucoid solutions. Cells were
Antibodies passed through a 20-μm mesh filter, resuspended in astrocyte growth
The following antibodies were used: monoclonal anti-V5 (Thermo medium (AGM; DMEM (Gibco 11960), 10% FBS, 10 μM, hydrocortisone,
Fisher, R960-25, immunoblot (IB) 1:1,000, immunofluorescence (IF) 100 U ml−1 penicillin/streptomycin, 2 mM l-glutamine, 5 μg ml−1 Insulin,
1:500, immunohistochemistry (IHC) 1:500), rat anti-HA (Sigma, 1 mM sodium pyruvate, 5 μg ml−1 N-acetyl-l-cysteine) and 30 million
12158167001, IB 1:1,000, IF 1:500, IHC 1:200), mouse anti-HA (Biole- cells were plated on 75-mm2 flasks (non-ventilated cap) coated with
gend, MMS-101P, IB 1:1,000), chicken anti-GFP (Abcam, ab13970, IB poly-d-lysine. Flasks containing cells were incubated at 37 °C in 10%
1:1,000, IF 1:1,000, IHC 1:1,000), rabbit anti-mCherry (Abcam, ab167453, CO2. On day in vitro (DIV) 3, AGM was removed and replaced with DPBS.
IF 1:500, IHC 1:500), rabbit anti-PSD95 (Life Techonologies, 51-6900, Flasks were then shaken vigorously by hand for 10–15 s until only the
IHC 1:200), mouse anti-PSD95 (ThermoFisher, 7E3, IB 1:1,000), guinea adherent monolayer of astroglia remained. DPBS was then replaced
pig anti-VGLUT1 (Synaptic Systems, 135-304, IF 1:1,000, IHC 1:1,000), with fresh AGM. On DIV 4, the medium was supplemented with AraC
rabbit anti-gephyrin (Synaptic Systems, 147-002, IF 1:1,000, IHC 1:500), protein for 3 days to eliminate fast dividing cells, and astrocytes were
mouse anti-gephyrin (Synaptic Systems, 147-011, IB 1:1000, IF 1:300), treated with AAVs. On DIV 7, astrocytes were passaged into 6-well dishes
guinea pig anti-VGAT (Synaptic System, 131-004, IF 1:1,000, IHC 1:500), (400,000 cells per well) and half the medium was replaced every 2–3
rabbit anti-NL2 (Synaptic System, 129-202, IB 1:500), rabbit anti-NRCAM days. On DIV 14, astrocytes were collected for immunoblotting analy-
(Abcam, ab24344, IB 1:1,000, IHC 1:200), rabbit anti-HOMER1 (Synaptic sis. HEK 293T (obtained from ATCC, CRL-11268; short tandem repeat
Systems, 160002, IF 1:2,000), rabbit anti-GABA-A receptor β2 (Synaptic confirmed and mycoplasma negative) cells were maintained in DMEM
Systems, 224-803, IF 1:1,000), goat anti-neuropilin-2 (R & D Systems, (Gibco) supplemented with 10% FBS (Gibco) and 100 U ml−1 penicillin/
AF567, IB 1:500), rat anti-tdTomato (Kerafast, EST203, IHC 1:1,000), streptomycin. Cell lines were incubated at 37 °C in 5% CO2. Cells were
rat anti-tubulin (Santa Cruz, sc-53029, IB 1:1,000), rabbit anti-Ezrin regularly passaged every three days.
(Cell Signaling, 3142, IHC 1:200), rabbit anti-EAAT2 (GLT1) (Alamone,
AGC-022, IB 1:1,000), rabbit anti-KIR4.1 (Alamone, APC-035, IB 1:500), Immunostaining and imaging analysis
rabbit anti-NL3 (Novus, NBP1-90080, IB 1:500), Alexa Fluor 488 Goat Cultured neurons and astrocytes were infected with small-scale AAVs at
anti-Mouse (ThermoFisher, A32723), Alexa Fluor 488 Goat anti-Rabbit DIV 14. After 3 days, these cells were treated with 500 μM biotin for 6 h.
Article
Neurons and astrocytes were fixed at indicated time points in 4% PFA, astrocytes expressing mCherry–CAAX were imaged by 63× plus 1×
4% sucrose for 20 min at room temperature. They were permeabilized optical zoom high magnification on Zeiss 780 confocal microscope
with 0.1% Triton-X 100 and 10% normal goat serum (NGS) for 30 min (Zen Software). Synapse number quantification by colocalization takes
at room temperature. Samples were then incubated for overnight at advantage of the fact that pre- and postsynaptic proteins appear colo-
4 °C with primary antibodies followed by Alexa Fluor 488-, Alexa Fluor calized at synaptic junctions due to their close proximity. Each Z-stack
555 or Alexa 647-conjugated secondary antibodies diluted in PBS con- was converted into 5 maximum projection images by condensing three
taining 0.01% Triton X-100 and 10% NGS for 2 h at room temperature. consecutive optical sections using ImageJ. The number of colocal-
The neuron and HEK 293T cells mixed-culture assay was performed ized synaptic puncta of excitatory intracortical (VGLUT1–PSD95), and
as previously described32,33. In brief, HEK 293T cells were transfected inhibitory (VGAT–gephyrin) were obtained using the ImageJ plugin
using Lipofectamine 2000 according to the manufacturer’s instruc- Puncta Analyzer50 (B. Wark, available upon request from cagla.eroglu@
tions. After 20 h, transfected HEK 293T cells were seeded on cultured dm.duke.edu). For each image, colocalized synaptic puncta were quan-
neurons at DIV 14. Fluorescence images were acquired with Zeiss Imager tified within astrocytes from ROIs of 100 μm2 area that were focused
M2 upright microscope equipped with an Apotome module, Zeiss 710, away from regions with neuronal cell bodies (areas lacking synaptic
Zeiss 780 or Zeiss 880 confocal microscopes using the Zen Software puncta). Statistical analysis of the synaptic staining was performed with
or a stimulated emission depletion (STED) super resolution micro- a one-way ANOVA followed by a post hoc Fisher’s LSD test when neces-
scope (TCS SP8 STED, Lecia Microsystems) using the Leica Application sary. Images were analysed blinded to the experimental conditions.
Suite (LAS) software. The individual acquiring the images was always
blinded to the experiment. Images were quantified and post-processed Analysis of synaptic distance
using FIJI. Synaptic distance was analysed with super-resolution imaging as pre-
viously described51. In brief, P42 control and experimental tissue sec-
Immunohistochemistry and imaging analysis tions were stained with an antibody against mCherry, NRCAM, Ezrin
Immunohistochemistry was performed as previously described48,49. In and synaptic makers (VGLUT1, PSD95, VGAT and gephyrin). Optical
brief, brains were fixed in 4% PFA, 4% sucrose, and coronally or sagittally sections of astrocytes expressing mCherry–CAAX were imaged by
sectioned with a cryostat (Leica Microsystems) at a thickness of 40 μm 93× plus 5× optical zoom high magnification on a STED microscope
or 100 μm. The slices were incubated with primary antibodies diluted (TCS SP8 STED, Lecia Microsystems). The distance was measured as
in PBS containing 0.1% Triton X-100 and 10% NGS at 4 °C for 2 days fol- the distance between the peak positions of the two distributions of
lowed by Alexa Fluor 488- or Alexa Fluor 555- or Alexa 647-conjugated localization points using the Leica Application Suite (LAS) software.
secondary antibodies diluted in PBS containing 0.1% Triton X-100 and Statistical analysis was performed with a Student’s t-test or one-way
10% NGS for 2 h at room temperature. The nuclei were visualized by ANOVA followed by a post hoc Fisher’s LSD test when necessary. Images
staining with DAPI. were analysed blinded to the experimental conditions.
Astrocyte morphology was analysed as previously described9. For
the astrocyte territory volume analysis, entire astrocytes expressing In vivo TurboID protein purification
mCherry–CAAX in 100 μm-thick floating sections were imaged using In vivo TurboID experiments were performed as previously
a 63× objective with 1× optical zoom images on the Zeiss 780 upright described19,46, with some modifications. Each AAV-TurboID probe
confocal microscope (Zen Software) and processed with Imaris soft- virus was retro-orbitally injected into CD1 juvenile mouse brain (P21).
ware. The fluorescence signal from each astrocyte was reconstructed Three weeks after viral injection, biotin was subcutaneously injected
using the surface tool. The intersecting nodes of the surface render at 24 mg kg−1 for 7 consecutive days to increase the biotinylation effi-
(vertices) were identified using the Matlab extension ‘Visualize Surface ciency. For each TurboID probe, 4–10 mice were used for biotinylated
Spots’. The Matlab Xtension ‘Convex hull’ identified the most terminal protein purification. Each purification was performed independently
vertices (outside edges of the 3D surface render) and created an addi- at least three times. Each cortex was lysed in 50 mM Tris/HCl, pH 7.5;
tional surface render to connect these terminal vertices by the shortest 150 mM NaCl; 1 mM EDTA; protease inhibitor mixture (cOmplete Mini
distance possible. Thus, a surface render of the outer rim (that is, terri- EDTA-free, Roche); and phosphatase inhibitor mixture (PhosSTOP,
tory) of each astrocyte was formed. The volume of each territory was Roche). The lysed samples were added to an equal volume of 50 mM
measured in Imaris and recorded. Astrocyte territory sizes between Tris-HCl, pH 7.5, 150 mM NaCl, 1 mM EDTA, 0.4% SDS, 2% TritonX-100,
experimental conditions were statistically analysed using one-way 2% deocycholate, protease inhibitor mixture and phosphatase inhibi-
ANOVA followed by Fisher’s least-squares difference (LSD) post hoc tor mixture, and then sonicated and centrifuged at 15,000g for 10 min.
test when necessary. The individual analysing the images was always Supernatant was further ultracentrifuged at 100,000g for 30 min at
blinded to the experimental conditions. For the NIV analysis, astrocytes 4 °C (Beckman TLA-100 ultracentrifuge, TLA-55 rotor). SDS was added
expressing mCherry–CAAX were imaged by 63× plus 2× optical zoom to the cleared supernatant to a final concentration of 1% and heated at
high magnification on Zeiss 710 confocal microscope. The images were 45 °C for 45 min. The sample was cooled on ice and incubated with Pierce
uploaded into Imaris Bitplane software for 3D reconstructions. We High Capacity NeutrAvidin Agarose (ThermoFisher) at 4 °C overnight.
chose at least three regions of interest (ROIs) measuring 200 pixels × Beads were washed twice with 2% SDS; twice with 1% TritonX-100, 1%
200 pixels × 20 pixels from each astrocyte that were devoid of the soma deoxycholate, 25 mM LiCl; twice with 1 M NaCl and 5 times with 50 mM
and large branches. ROIs were reconstructed using the surface tool in ammonium bicarbonate. Biotinylated proteins were eluted in 125 mM
Imaris. NIV was calculated in Imaris and statistically analysed using a Tris-HCl, pH6.8, 4% SDS, 0.2% β-mercaptoethanol, 20% glycerol and
one-way ANOVA followed by a Fisher’s LSD post hoc test. Images were 3 mM biotin at 60 °C for 15 min.
analysed blinded to the experimental conditions.
Quantitative LC–MS/MS analysis
Analysis of synaptic number Samples were spiked with either a total of 120 or 240 fmol of casein and
Synaptic number was analysed as previously described9. In brief, P42 reduced with 10 mM dithiolthreitol for 30 min at 80 °C and alkylated
control and experimental tissue sections were stained with an antibody with 20 mM iodoacetamide for 45 min at room temperature, then sup-
against mCherry, biotinylated proteins and the following antibodies plemented with a final concentration of 1.2% phosphoric acid and 328 μl
against pre- and postsynaptic protein pairs: VGLUT1 and PSD95 (makers of S-Trap (Protifi) binding buffer (90% methanol, 100 mM triethylam-
of excitatory synapses) and VGAT and gephyrin (markers of inhibitory monium bicarbonate (TEAB)). Proteins were trapped on the S-Trap,
synapses). Five-micrometre-thick Z-stacks of 15 optical sections of digested using 20 ng μl−1 sequencing grade trypsin (Promega) for
1 h at 47 °C, and eluted using 50 mM TEAB, followed by 0.2% formic acid 15 min at 4 °C, was incubated with GFP-Trap Agarose beads (Chromotek)
(FA), and lastly using 50% acetonitrile, 0.2% FA. All samples were then at 4 °C overnight. For the protein expression assay from cultured astro-
lyophilized to dryness and resuspended in 12 μl 1% trifluoroacetic acid, cytes, the cell was lysed in 25 mM Tris, pH 7.4, 150 mM NaCl, 1 mM CaCl2,
2% acetonitrile containing 12.5 fmol μl−1 yeast alcohol dehydrogenase. 1 mM MgCl2, 0.5% NP-40 and protease inhibitor mixture. The lysed
From each sample, 3 μl was removed to create a QC pool sample which samples were centrifuged at 15,000g for 5 min at 4 °C. For the endog-
was run periodically throughout the acquisition period. enous NRCAM binding assay, juvenile mouse cortex (P42) was lysed
Quantitative LC–MS/MS was performed on 2 μl of each sample, in 25 mM HEPES, pH 7.5; 150 mM NaCl; 1 mM EDTA; 1% NP-40; protease
using a nanoAcquity UPLC system (Waters Corp) coupled to a Thermo inhibitor mixture; and phosphatase inhibitor mixture. The lysed sam-
Orbitrap Fusion Lumos high resolution accurate mass tandem mass ples were centrifuged at 15,000g for 10 min at 4 °C. Supernatant was
spectrometer (Thermo) via a nanoelectrospray ionization source. In pre-cleared with at 4 °C for 30 min with Protein G Sepharose beads (Mil-
brief, the sample was first trapped on a Symmetry C18 20 mm × 180 μm lipore). NRCAM was immunoprecipitated with anti-NRCAM antibody
trapping column (5 μl min−1 at 99.9/0.1 v/v water/acetonitrile), after followed by Protein G Sepharose beads overnight at 4 °C. SDS–PAGE
which the analytical separation was performed using a 1.8 μm Acquity and immunoblotting were performed as previously described48,49.
HSS T3 C18 75 μm × 250 mm column (Waters) with a 90-min linear gradi- The data were obtained with Odyssey Software v.4. Full gel images are
ent of 5–30% acetonitrile with 0.1% formic acid at a flow rate of 400 nl shown in Supplementary Fig. 1.
min−1 with a column temperature of 55 °C. Data collection on the Fusion
Lumos mass spectrometer was performed in a data-dependent acquisi- Electrophysiological analysis
tion (DDA) mode of acquisition with r = 120,000 (at m/z 200) full MS For whole-cell patch clamp recordings, P42–48 mice were decapi-
scan from m/z 375 to 1,500 with a target AGC value of 2 × 105 ions. MS/MS tated under deep isoflurane anaesthesia. Brains were removed and
scans were acquired at rapid scan rate in the linear ion trap with an 300-μm sagittal slices were prepared in ice cold, oxygenated cut-
AGC target of 5 × 103 ions and a max injection time of 100 ms. The total ting solution containing (in mM) 85 NaCl, 3 KCl, 1.3 MgSO4·7H2O,
cycle time for MS and MS/MS scans was 2 s. A 20 s dynamic exclusion 1.25 NaH2PO4·H2O, 26 NaHCO3, 25 dextrose, 2.5 CaCl2, and 75 sucrose
was employed to increase depth of coverage. at ~320 mOsm l−1, with a vibratome (Leica VT 1000S). Slices were
Following data collection, data were imported into Proteome recovered for 30 min at 31.5 °C in 95% O 2, 5% CO2 bubbled artificial
Discoverer 2.2 (Thermo Scientific), and individual LC–MS .raw files cerebrospinal fluid (ACSF) (containing (in mM) 124 NaCl, 3 KCl,
were aligned on the basis of the accurate mass and retention time of 1.3 MgSO4·7H2O, 1.25 NaH2PO4·H2O, 26 NaHCO3, 10 dextrose, 2.5 CaCl2
detected ions (‘features’) using the Minora Feature Detector algorithm at ~310 mOsm l−1) and then at room temperature for at least 1 h. Slices
in Proteome Discoverer. Relative peptide abundance was calculated were superfused with oxygenated ACSF at room temperature. To
based on peak intensities following integration of selected ion chro- isolate mIPSCs, 50 μm D-APV, 10 μm NBQX and 0.5 μm TTX was added
matograms of the aligned features across all runs. The MS/MS data was to ACSF. To isolate mEPSCs, 0.5 μm picrotoxin and 0.5 μm tetrodo-
searched against a SwissProt Mus musculus database (downloaded in toxin (TTX) was added to ACSF. V1 cells were visually identified under
Apr 2018) containing an equal number of reversed-sequence ‘decoys’ Zeiss Axio Examiner.D1 microscope with 20× dipping objective and
for false discovery rate determination. Mascot Distiller and Mascot IR-1000 camera (DAGE-MTI) using an IR bandpass filter. Cortical
Server (v.2.5, Matrix Sciences) were used to produce fragment ion cells in layer 2/3 were patched using glass pipettes (4–7 MΩ resist-
spectra and to perform the database searches using full trypsin ance) made from borosilicate glass capillaries (Sutter Instrument)
enzyme rules with 5 ppm precursor and 0.8 Da product ion match using a P-97 puller (Sutter Instrument). Pipettes were filled with
tolerances. Database search parameters included fixed modifica- internal solution containing: (in mM) 135 CsMeSO3, 8 NaCl, 10 HEPES,
tion on cysteine (carbamidomethyl) and variable modifications on 0.3 EGTA, 10 Na2-phosphocreatine, 4 MgATP, 0.3 Na2GTP, 5 TEA-Cl,
methionine (oxidation) and asparagine and glutamine (deamida- 5 QX-314 at ~290 mOsm l−1. Miniature post-synaptic currents were
tion). Peptide Validator and Protein FDR Validator nodes in Proteome measured at −70 mV. Series resistance was monitored throughout
Discoverer were used to annotate the data at a maximum 1% protein all recordings and only recordings that remained stable over the
false discovery rate. recording period (≤30 MΩ resistance and <20% change in resist-
ance) were included. Data were recorded using a Multiclamp 700B
Split-TurboID protein network amplified (Molecular Devices), digitized at 50 kHz using a Digidata
Split-TuboID and TurboID-surface protein networks were performed 1550 digitizer (Molecular Devices), and low-pass filtered at 1kHz. All
as previously described19,46 with modifications. Network figures were data were acquired using pClamp software and analysed in Clampfit
created using Cytoscape (v.3.7), with nodes corresponding to the (Molecular Devices) including only events larger than 5 pA. Events
gene name (multiple isoforms of proteins were collapsed into one were initially identified using a custom-made template and manu-
node based on gene nomenclature) for proteins identified in the prot- ally assessed for inclusion with the template search function. Rise
eomic analysis. The known protein–protein interaction networks were time was defined as the time from 10–90% of the peak. All chemicals
provided by Strings, HitPredict, HPRD, BioGrid and APID database. A were purchased from Sigma-Aldrich or Tocris. Experiments were
non-redundant list of protein–protein interactions was assembled performed blinded to the condition.
using the MGI database, GeneCard and UniProt (Supplementary
Tables 5–6). In all networks, node size is proportional to fold enrich- Statistical analysis
ment over soluble TurboID alone. However, the bait (Split-TurboID and Data are expressed as the mean ± s.e.m. Statistical analyses were
TurboID-surface) node sizes were set manually. Clustergrams were performed with GraphPad Prism version 6 (GraphPad Software).
created by manual inspection on the basis of Uniprot and GeneCard We compared independent sample means using t-tests and one-way
database annotation as previously described19,46. ANOVAs as appropriate. Statistically significant F values detected in
the ANOVAs were followed by alpha-adjusted post hoc tests (Tukey’s
Immunoprecipitation and immunoblotting honestly significant difference). We confirmed necessary paramet-
HEK 293T cells were transfected with Lipofectamine 2000 according ric test assumptions using the Shapiro–Wilk test (normality) and
to the manufacturer’s instructions. After 20 h, transfected HEK 293T Levene’s test (error variance homogeneity). P < 0.001, P < 0.01 and
cells were lysed in 25 mM HEPES, pH 7.5, 150 mM NaCl, 1 mM EDTA, 1% P < 0.05 were considered to indicate statistical significance. Sam-
NP-40, protease inhibitor mixture and phosphatase inhibitor mixture. ple size for each experiment is indicated in the figure legend for
The cell lysate, which was obtained by centrifugation at 15,000g for each experiment. Sample sizes were determined based on previous
Article
experience for each experiment to yield high power to detect spe- 50. Ippolito, D. M, Eroglu, C. Quantifying synapses: an immunocytochemistry-based assay to
quantify synapse number. J. Vis. Exp. 16, 2270 (2010).
cific effects. No statistical methods were used to predetermine 51. Dani, A., Huang, B., Bergan, J., Dulac, C. & Zhuang, X. Superresolution imaging of
sample size. All results of the statistical analysis are shown in Sup- chemical synapses in the brain. Neuron 68, 843–856 (2010).
plementary Table 7.
Acknowledgements We thank H. Katsura for modifying the promoter of the surface and split
Reporting summary TurboID plasmids for HEK 293T cell expression, and B. Duncan for technical support. This work
was supported by Brain initiative RO1DA047258 from NIH (S.H.S. and C.E.), R01MH113280 from
Further information on research design is available in the Nature
NIH (P.F.M.), Kahn Neurotechnology Award (S.H.S. and C.E.), a Grant-in-Aid for JSPS Fellows
Research Reporting Summary linked to this paper. (PD) 20153173 from the Japan Society for the Promotion of Science (T.T.), The Uehara memorial
Foundation (T.T.), and National Institute of Mental Health Fellowship F30MH117851 (J.L.C).
Author contributions T.T., C.E. and S.H.S. designed the study. T.T., J.T.W., A.P., C.E. and S.H.S.
Data availability wrote the manuscript. T.T., J.T.W., A.U. and E.J.S. performed in vivo BioID-proteomics analysis.
Proteomics data are available in the MassIVE database under accession T.T., J.T.W., J.L.C., T.S. and P.F.M. produced the constructs. T.T., J.T.W. and K.T.B. performed
imaging analysis and the morphological analysis of the astrocytes. A.P. performed
MSV000085821. The data that support the findings of this study are
electrophysiological analysis. T.T. and K.T.B. performed the biological experiments. All authors
available from the corresponding author upon reasonable request. discussed the results and commented on the manuscript text.
46. Spence, E. F. et al. In vivo proximity proteomics of nascent synapses reveals a novel Competing interests The authors declare no competing interests.
regulator of cytoskeleton-mediated synaptic maturation. Nat. Commun. 10, 386
(2019). Additional information
47. Shin, J. H., Yue, Y. & Duan, D. Recombinant adeno-associated viral vector production and Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
purification. Methods Mol. Biol. 798, 267–284 (2012). 2926-0.
48. Takano, T. et al. LMTK1 regulates dendritic formation by regulating movement of Correspondence and requests for materials should be addressed to T.T., C.E. or S.H.S.
Rab11A-positive endosomes. Mol. Biol. Cell 25, 1755–1768 (2014). Peer review information Nature thanks Thomas Biederer, Peter Scheiffele and the other,
49. Takano, T. et al. Discovery of long-range inhibitory signaling to ensure single axon anonymous, reviewer(s) for their contribution to the peer review of this work.
formation. Nat. Commun. 8, 33 (2017). Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | The reconstituted activity of Split-TurboID in infected with AAV1/2-GfaABC1D-TurboID-HA-surface, AAV1/2-hSynI-
neurons and astrocytes in vitro. a, Schematics of constructs tested. V5-N-TurboID and/or AAV1/2-GfaABC1D-C-TurboID-HA. Representative images
b, Immunoblot analysis of construct expression and biotinylation activity. of neuron and astrocyte at DIV14 after the treatment of 500 μM biotin for 6h are
c, Schematic of neuron-astrocyte mixed-culture assay for Split-TurboID with shown. n = 3 biological repeats.
cell-type-specific AAVs in vitro. d, Cultured neurons and astrocytes were
Article
Extended Data Fig. 2 | Split-TurboID maps excitatory and inhibitory HOMER1 (b), inhibitory presynaptic marker VGAT (c), and postsynaptic marker
perisynaptic proteins. a–d, Representative images demonstrating that gephyrin (d). Astrocytes were visualized with GfaABC1D-mCherry-CAAX. n = 3
proteins biotinylated by astrocytic TurboID-surface or Split-TurboID (cyan) are biological repeats.
adjacent to excitatory presynaptic marker VGLUT1 (a), postsynaptic marker
Extended Data Fig. 3 | Brain-wide transduction of astrocytes and neurons. expression throughout the cortex and other structures. c, Representative
a, Schematic of AAV PHP.eB viruses for neuronal-EGFP or astrocyte-mCherry- image from cortex, hippocampus or cerebellum showing high coverage of
CAAX and retro-orbital injection. b, Sagittal section of mouse brain showing neuronal and astrocytic expression.
Article

Extended Data Fig. 4 | Mapping and identification of tripartite synaptic identified by mass spectrometry and filters used to identify top candidates.
cleft proteins by Split-TurboID in vivo. a, Biotinylation activity of Split- h, Venn diagram comparing proteome list of Split-TurboID and TurboID-
TurboID in vivo. Lysates of mouse brain infected with cell-type-specific surface. i, Scale-free network of Split-TurboID (green) and TurboID-surface
TurboID-surface-HA, V5-N-TurboID and/or C-TurboID-HA. Brain lysates were (blue) identified proteins. High-confidence proteins enriched in both Split-
analysed by immunoblotting with anti-Streptavidin, anti-V5, anti-HA and anti- TurboID and TurboID-surface fractions are shown in red. Neuronal enriched
Tubulin antibodies. b, The graph indicates the ratio of botinylation activity proteins (RNA-seq expression ratio <1) and astrocyte enriched proteins (RNA-
in vivo (n = 4 brains per each condition). c, d, The biotinylation of Split-TurboID seq expression ratio≧1.0) are represented as circle or diamond, respectively.
in mouse cortex. e, f, Quantification of average number of excitatory or At least n = 4 biological repeats. One-way ANOVA (Dunnett’s multiple
inhibitory synaptic colocalized puncta in layer 2/3 of the visual cortex. n = 15 comparison, P < 0.0001, 0.001). Data are means ± s.e.m.
slices per each condition from 3 mice. g, Chart summarizing proteomic data set
Article

Extended Data Fig. 5 | The validation of candidate proteins with CRISPR- co-infected with AAV PHP.eB-GfaABC1D-Cas9 in Flex-TdTomato mice at P21.
based astrocytic candidate gene depletion strategy. a, Schematic of Coronal sections were prepared and immunostained with an anti-TdTomato
CRISPR-based deletion of astrocytic NrCAM in vitro. b, Immunoblots showing antibody. g, A High-magnification image is shown. h, Images of Tenm2-, Tenm4-
loss of NrCAM with sgRNA. AAV1/2-U6-empty sgRNA or AAV1/2-U6-NrCAM or NrCAM-deleted astrocytes (cyan) and their territories (red outlines) in visual
sgRNA was co-infected with AAV1/2- GfaABC1D-Cas9 to cultured neurons and cortexes of juvenile mice. i, Average territory volumes at P42 of Tenm2-,
astrocytes at DIV14. The cells were subjected to immunoblot analysis with an Tenm4- or NrCAM-deleted astrocytes. Between 20-25 cells per condition
anti-NrCAM antibody. Tubulin was used as a loading control. c, The bar graph from 3 mice. j, Images of Tenm2-, Tenm4- or NrCAM-deleted astrocytes (cyan)
indicates the expression level of NrCAM from 3 independent experiments. and their NIV reconstructions (orange) in visual cortexes of juvenile mice.
d, Schematic of CRISPR-based deletion strategy of candidate gene. k, Average NIV at P42 of Tenm2-, Tenm4- or NrCAM-deleted astrocytes. 51 cells
e, Experimental timeline of AAV-mediated CRISPR-based astrocytic gene per each condition from 3 mice. n = 3 biological repeats. One-way ANOVA
deletion strategy in Flex-TdTomato mice. f, AAV PHP.eB-U6-NrCAM sgRNA was (Dunnett’s multiple comparison, P < 0.0001, 0.01). Data are means ± s.e.m.
Article
Extended Data Fig. 6 | NrCAM is a novel tripartite synaptic protein. a, A high sections were immunostained with anti-NrCAM antibody (cyan). High
magnification STED image showing that endogenous NrCAM was enriched at magnification image was shown (right panel). e, Schematic of the visualization
biotinylated proteins in vivo. b, Immunoblot analysis of endogenous NrCAM, of both astrocytic and neuronal NrCAM in vivo. f, STED images demonstrating
astrocyte marker GFAP, neuronal marker b-Tubulin III or loading control that the colocalization of astrocytic NrCAM with neuronal NrCAM in vivo.
α-Tubulin from mouse brain or purified astrocyte lysate. c, Schematic of the Coronal sections were prepared and co-immunostained with an anti-V5 (cyan)
visualization of astrocytic membrane and endogenous NrCAM in vivo. d, STED and anti-HA (magenta) antibody. A high-magnification image is shown in the
images demonstrating the localization of endogenous NrCAM in vivo. Coronal right. n = 3 biological repeats. Data represent means ± s.e.m.
Extended Data Fig. 7 | The role of astrocytic NrCAM in astrocytic constructs of sgRNA-resistant human NrCAM, neuronal NrCAM deletion
morphogenesis in vivo. a, Schematic of CRISPR-based NrCAM deletion (neuroNrCAM sgRNA), or following neuronal NrCAM deletion alone. Images at
in vivo. b, Schematic of hNrCAM domains and fragments. SP, signal peptide; indicated ages represent. e, i, Analysis of astrocyte territory, 15–29 cells per
IG, immunoglobulin; FN, fibronectin; TMD, transmembrane domain; ECD, each condition from 3 mice; g, k, Analysis of neuropil infiltration volume.
extracellular domains. c, Immunoblots showing the expression of each NrCAM 50–51 cells per each condition from 3 mice. n = 3 biological repeats. One-way
fragments in HEK293T cells. d, f, h, j, Images of astrocytes following deletion of ANOVA (Dunnett’s multiple comparison, P < 0.0001). Data represent
astrocyte NrCAM alone (NrCAM sgRNA), with coexpression with indicated means ± s.e.m.
Article

Extended Data Fig. 8 | NrCAM controls inhibitory synaptic specializations integrated intensity of VGAT (Cont = 258, NL2 = 222, NrCAM = 242, NrCAM-
through binding the gephyrin. a, Immunoblot analysis of endogenous ΔIG = 288, NrCAM-ΔECD = 303 cells) or GABAA receptor (Cont = 313,
NrCAM, astrocyte marker GFAP, Neuroligin 2, Neuroligin 3, Kir4.1 or EAAT2 NRX1β4(-) = 310, NrCAM = 300, NrCAM-ΔIG = 278, NrCAM-ΔECD = 278 cells)
(GLT1) from purified astrocyte lysate. b, The bar graph indicates the expression clusters that contact transfected HEK293T cells. i–l, Images of in vitro
level. c, The interaction of NrCAM with PSD95 and gephyrin in HEK293T cells. excitatory synapse formation assay. The graph shows average of the total
Cell lysates coexpressing NrCAM-HA with GFP, PSD95-GFP or GFP-gephyrin integrated intensity of VGLUT1 (Cont = 259, NL2 = 306, NrCAM = 286, NrCAM-
were incubated with anti-GFP-bound beads. Immunoprecipitated (right) or ΔIG = 321, NrCAM-ΔECD = 196 cells) or HOMER1 (Cont = 471, NRX1β4(-) = 214,
total (left) NrCAM, GFP, PSD95-GFP or GFP-gephyrin were detected by NrCAM = 247, NrCAM-ΔIG = 387, NrCAM-ΔECD = 251 cells) clusters that contact
immunoblotting with anti-HA and anti-GFP antibodies. d, Schematic of transfected HEK293T cells. n = 3 biological repeats. One-way ANOVA (Dunnett’s
HEK293T/neuronal mixed-cultured assay in vitro. e–h, Images of in vitro multiple comparison, P < 0.0001). Data are means ± s.e.m.
inhibitory synapse formation assays. The graph shows average of the total
Article
Extended Data Fig. 9 | The effect of NrCAM on excitatory synapse c, mEPSC traces from L2/3 pyramidal neurons following astrocyte control
formation and function in vivo. a, Images of postsynapse PSD95 and empty sgRNA or NrCAM sgRNA expression. d–g, Quantification of mEPSC
presynapse VGLUT1 within NrCAM-deletion astrocytes in L1 of the visual amplitude (d, e, Cont = 16, NrCAM sgRNA = 14 cells from 4 mice) and frequency
cortex. High magnification images (bottom) correspond to boxes (above). (f, g, Cont = 14, NrCAM sgRNA = 17 cells from each of 4 mice). At least n = 3
b, Quantification of average number of excitatory synaptic colocalized puncta biological repeats. Student’s t-test (paired, P < 0.05). Data represent
within astrocyte territories. n = 15 cells per each condition from 3 mice. means ± s.e.m.
Extended Data Fig. 10 | In vivo chemogenetics method, Split-TurboID, tripartite synapse in vivo. Mapping this interface, we discovered a new
reveals a novel astrocytic cell adhesion molecule, NrCAM, that controls molecular mechanism by which astrocytes influence inhibitory synapses
inhibitory synaptic organization. Development of in vivo chemo-affinity within the tripartite synaptic cleft via NrCAM. NrCAM is expressed in cortical
codes, Split-TurboID, and a working model of astrocytic NrCAM influencing astrocytes where it interacts with neuronal NrCAM that is coupled to gephyrin
inhibitory synaptic function. Split-TurboID can map the molecular composition at inhibitory postsynapses. Loss of astrocytic NrCAM dramatically alters
of such intercellular contacts, even within the highly complex structure of the inhibitory synaptic organization and function in vivo.
Scott H. Soderling, Cagla Eroglu and Tetsuya
Corresponding author(s): Takano
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection Zen Software (Zen black 2.3, Zen black 2.1 SP3 FP1), Leica Application Suite (LAS) software(v3.5.5), Odyssey Software(v4), Proteome
Discoverer 2.2 (Thermo Scientific Inc.), GraphPad Prism (v8) and pClamp (v10) were used for data collection.
Data analysis Imaris software (v8.2.1.), Leica Application Suite (LAS) software (v3.5.5), ImageJ (v10.2), and Mascot Distiller and Mascot Server (v 2.5,
Matrix Sciences) were used for data analysis. Minora Feature Detection alogrithm is part of the Protein Discover Package version2.2.
Data
April 2020

The data that support the findings of this study are available from the corresponding author upon reasonable request.
1

Sample size Sample sizes were determined based on previous experience (Stogsdill et al., Nature, 2017) for each experiment to yield high power to detect
specific effects. No statistical
methods were used to predetermine sample size.
Data exclusions No data were excluded from the analyses.
Replication All attempts at replication (3-5 times) were successful.
Randomization Allocation was random.
Blinding Images data collection and statistical analyses were analyzed blinded to the experimental conditions.


Antibodies ChIP-seq
Clinical data
Antibodies
Antibodies used The following antibodies were used: monoclonal anti-V5 (ThermoFisher, R960-25, IB 1:1000, IF 1:500, IHC 1:500), rat anti-HA (Sigma,
12158167001, IB 1:1000, IF 1:500, IHC 1:200), mouse anti-HA (Biolegend, MMS-101P, IB 1:1000), chicken anti-GFP (Abcam, ab13970,
IB 1:1000, IF 1:1000, IHC 1:1000), rabbit anti-mCherry (Abcam, ab167453, IF 1:500, IHC 1:500), rabbit anti-PSD95 (Life Techonologies,
51-6900, IHC 1:200), mouse anti-PSD-95 (ThermoFisher, 7E3, IB 1:1000), guinea pig anti-VGLUT1 (Synaptic Systems, 135-304, IF
1:1000, IHC 1:1000), rabbit anti-gephyrin (Synaptic Systems, 147-002, IF 1:1000, IHC 1:500), mouse anti-gephyrin (Synaptic Systems,
147-011, IB 1:1000, IF 1:300), guinea pig anti-VGAT (Synaptic System, 131-004, IF 1:1000, IHC 1:500), rabbit anti-NL2 (Synaptic
System, 129-202, IB 1:500), rabbit anti-NrCAM (Abcam, ab24344, IB 1:1000, IHC 1:200), rabbit anti-Homer1 (Synaptic Systems,
160002, IF 1:2000), rabbit anti-GABA-A receptor 2 (Synaptic Systems, 224-803, IF 1:1000), goat anti-Neuropilin-2 (R & D Systems,
AF567, IB 1:500), rat anti-tdTomato (Kerafast, EST203, IHC 1:1000), rat anti-Tubulin (Santa Cruz, sc-53029, IB 1:1000), rabbit anti-
Ezrin (Cell Signaling, #3142, IHC 1:200), rabbit anti-EAAT2 (GLT1) (Alamone, AGC-022, IB 1:1000), rabbit anti-Kir4.1 (Alamone,
APC-035, IB 1:500), rabbit anti-NL3 (Novus, NBP1-90080, IB 1:500), Alexa Fluor 488 Goat anti-Mouse (ThermoFisher, A32723), Alexa
Fluor 488 Goat anti-Rabbit (ThermoFisher, A-11034), Alexa Fluor 488 Goat anti-Guinea pig (ThermoFisher, A11073), Alexa Fluor 488
Goat anti-Chicken (ThermoFisher, A-11006), Oregon Green 488 Goat anti-Rabbit (ThermoFisher, O-11038), Alexa Fluor 555 Goat anti-
Rabbit (ThermoFisher, A21428), Alexa Fluor 568 Goat anti-Rat (ThermoFisher, A-11077), Alexa Fluor 594 Streptavidin (ThermoFisher,
April 2020
S11227), Alexa Fluor 647 Donkey anti-rabbit (ThermoFisher, A31573), Alexa Fluor 647 Goat anti-Chicken (ThermoFisher, A-21449),
Alexa Fluor 647 Donkey anti-Guinea pig (Jackson ImmunoResearch, 706-605-148), Alexa Fluor 647 Streptavidin (ThermoFisher,
S21374), Atto647N anti-Mouse (Sigma, 50185), Atto647N anti-rabbit (Sigma, 40839), Donkey anti-Goat IRDye 800CW (LI-COR,
926-32214), Goat anti-rat IRDye 800CW (LI-COR, 925-32219), Goat anti-Mouse IRDye 680RD (LI-COR, 925-6818).
Validation 1 monoclonal anti-V5 ThermoFisher R960-25 ELISA, Immunocytochemistry, Immunofluorescence, Immunoprecipitation, Western
Blot Vender (IB, IF)
2 rat anti-HA Sigma 12158167001 ELISA, Immunocytochemistry, Immunofluorescence, Immunoprecipitation, Western Blot
"Hougbing Liu et al., 2014. J AM Heart Assoc 20; 3(3) (IB)
2
Fimiani et al., 2015. Nucleic Acids Res 18;43(16) (IB, IF)
Stogsdill et al., 2017. Nature "

3 mouse anti-HA Biolegend MMS-101P western blot (WB), immunocytochemistry (ICC), immunoprecipitation (IP), and flow
cytometry (FC). "Kim JY, et al. 2003. J Neurosci. 23:5561. (IP, WB)
Helliwell SB, et al. 2001. J Cell Biol. 153:649. (WB)
Bennett BD, et al. 2000. J Biol Chem. 275:37712. (IF, IP, WB)
Royer Y, et al. 2005. J. Biol. Chem. 29:27251. (FC)"
4 chicken anti-GFP Abcam ab13970 IHC-P, WB, IHC - Wholemount, IHC-FrFl, ICC/IF, IHC-Fr, IHC-FoFr Vender (IB, IF, IHC)
5 rabbit anti-mCherry Abcam ab167453 WB, ICC/IF, IHC-P "Stogsdill et al., 2017. Nature 551, 192-197 (IF, IHC)
Vender (IB, IF)"
6 rabbit anti-PSD95 Life Techonologies 51-6900 human mouse, rat ELISA, ICC, IF, IP, WB "Vender (IB, IF, IHC)
Stogsdill et al., 2017. Nature 551, 192-197 (IF, IHC)"
7 mouse anti-PSD-95 ThermoFisher MA1-046 human, mouse, rat, xenopus Flow, ICC, IF, IHC, IP, WB Vender (IF, IB)
8 guinea pig anti-VGLUT1 Synaptic Systems 135-304 rat, mouse, human, cow WB, IP, ICC, IHC, EM, FACS Vender (IB, IF, IHC)
9 rabbit anti-gephyrin Synaptic Systems 147-008 human, rat, mouse, pig, goldfish, zebrafish WB, IP, ICC, IHC Vender (ICC, IHC)
10 mouse anti-gephyrin Synaptic Systems 147-011 human, rat, mouse, pig, goldfish, zebrafish, chicken WB, IP, ICC, IHC, EM
"Davenport EC, Szulc BR, Drew J, Taylor J, Morgan T, Higgs NF, López-Doménech G, Kittler JT
Cell reports (2019) 268: 2037-2051.e6. 147 011 (WB, ICC, IHC)
Vender (ICC, IHC)"
11 guinea pig anti-VGAT Synaptic System 131-004 rat, mouse, zebrafish, ape WB, IP, ICC, IHC, EM Vender (WB, ICC, IHC)
12 rabbit anti-NL2 Synaptic System 129-202 human, rat, mouse, monkey, ape, cow WB, IP, ICC, IHC Stogsdill et al., 2017. Nature 551,
192-197 (IB)
13 rabbit anti-NrCAM Abcam ab24344 mouse, rat, human IHC-Fr, IHC-P, ICC/IF, WB, IP, IHC-FoFr Demynanenko et al., J Neurosci
34:1127-87 (IB, IHC)
14 rabbit anti-Homer1 Synaptic Systems 160002 human, rat, mouse WB, IP, ICC, IHC Vender (WB, ICC, IHC)
15 rabbit anti-GABA-A receptor b2 Synaptic Systems 224-803 rat, mouse WB, IP, ICC, IHC Vender (WB, ICC, IHC)
16 goat anti-Neuropilin-2 R & D Systems AF567 human, rat, mouse WB, IP, ICC, IHC, FACS Demynanenko et al., J Neurosci 34:1127-87
(IB)
17 rat anti-tdTomato Kerafast EST203 WB, ELISA, IF, IHC, IP "Stogsdill et al., 2017. Nature (IHC)
Vender (IF)"
18 rat anti-Tubulin Santa Cruz sc-53029 mouse, human, rat WB, IP, ICC, IHC, EM, FACS Vender (IB, IF, IHC)
19 rabbit anti-Ezrin Cell Signaling #3142 human, mouse, rat, monkey, bovine WB Vender (WB)
20 rabbit anti-EAAT2 (GLT1) Alamone AGC-022 human, mouse, rat ICC, IF, IHC, LCI, WB Vender (IB, IF, IHC)
21 rabbit anti-Kir4.1 Alamone APC-035 human, mouse, rat ICC, IF, IHC, IP, WB Vender (IB)
22 rabbit anti-NL3 Novus NBP1-90080 human, mouse, rat WB, IHC "Stogsdill et al., 2017. Nature 551, 192-197 (IB)
Vender (IB)"
Eukaryotic cell lines

Policy information about cell lines
Cell line source(s) HEK293T cell line was obtained from the Duke Cell Culture Facility.
Authentication The cell line was validated by STR testing.
Mycoplasma contamination The cell lines were tested for mycoplasma contamination and were negative.
Commonly misidentified lines No commonly misidentified cell lines were used in the study.
(See ICLAC register)

Laboratory animals P1~P42, male and female CD1 (022, Charles River), Cas9 (028239,JAX) and Ai14 (007914,JAX) mice. All animals were housed at 72F
+/-2 degrees at 30-70% humidity.
Wild animals This study did not involve wild animals.
Field-collected samples This study did not involve samples collected from the field.
Ethics oversight The Duke University Institutional Animal Care and Use Committee provided ethical approval and guidance.
April 2020
3
Article
The gut microbiota is associated with

immune cell dynamics in humans
https://doi.org/10.1038/s41586-020-2971-8 Jonas Schluter1,2 ✉, Jonathan U. Peled3,4, Bradford P. Taylor2, Kate A. Markey3,4,

Melody Smith3,4, Ying Taur5, Rene Niehus6, Anna Staffas7, Anqi Dai3, Emily Fontana5,
Received: 3 May 2019
Luigi A. Amoretti5, Roberta J. Wright5, Sejal Morjaria5, Maly Fenelus8, Melissa S. Pessin8,
Accepted: 30 September 2020 Nelson J. Chao9, Meagan Lew9, Lauren Bohannon9, Amy Bush9, Anthony D. Sung9,
Tobias M. Hohl5, Miguel-Angel Perales3,4, Marcel R. M. van den Brink3,4 & Joao B. Xavier2 ✉
Check for updates

The gut microbiota influences development1–3 and homeostasis4–7 of the mammalian
immune system, and is associated with human inflammatory8 and immune diseases9,10
as well as responses to immunotherapy11–14. Nevertheless, our understanding of how
gut bacteria modulate the immune system remains limited, particularly in humans,
where the difficulty of direct experimentation makes inference challenging. Here we
study hundreds of hospitalized—and closely monitored—patients with cancer
receiving haematopoietic cell transplantation as they recover from chemotherapy
and stem-cell engraftment. This aggressive treatment causes large shifts in both
circulatory immune cell and microbiota populations, enabling the relationships
between the two to be studied simultaneously. Analysis of observed daily changes in
circulating neutrophil, lymphocyte and monocyte counts and more than 10,000
longitudinal microbiota samples revealed consistent associations between gut
bacteria and immune cell dynamics. High-resolution clinical metadata and Bayesian
inference allowed us to compare the effects of bacterial genera in relation to those of
immunomodulatory medications, revealing a considerable influence of the gut
microbiota—together and over time—on systemic immune cell dynamics. Our analysis
establishes and quantifies the link between the gut microbiota and the human
immune system, with implications for microbiota-driven modulation of immunity.
The human gut microbiota is considered a major modulator of the between 2003 and 2019 (Fig. 1a, Supplementary Table 1). The condition-
immune system during development3 and in health and disease8,9. For ing regimen of radiation and chemotherapy administered to patients
example, preterm infants have distinct microbiome compositions before HCT is the most severe perturbation to the immune system
and distinct developmental trajectories of peripheral immune cell deliberately performed in humans: this offers a unique opportunity to
populations3. In adults, the success of immunotherapies that rely on investigate links between the gut microbiota and immune dynamics
peripheral immune cells, such as checkpoint inhibitor treatments, directly in humans.
has been associated with the composition of the microbiome11–13,15. Conditioning depletes white blood cell (WBC) counts, leading to
There is an increasing interest in using the microbiome to modulate the neutropenia (less than 500 neutrophils per μl blood) until transplanted
immune system and augment treatments7,16, including the growing field stem cells begin to release granulocytes from the bone marrow, initi-
of chimeric antigen receptor T cell therapy17. However, our understand- ating immune reconstitution (Fig. 1a–c). HCT also damages the gut
ing of how the microbiota influences the dynamics of immune cells microbiota18 and reduces its biodiversity (Fig. 1d–i), a collateral effect
in humans and how this compares to deliberate immunomodulatory associated with increased mortality in patients undergoing HCT19.
interventions remains limited owing to a lack of feasible experiments Immune and microbiome reconstitution vary considerably between
in human subjects. patients and treatment types (Fig. 1, Extended Data Fig. 1a), enabling
To overcome this limitation, we investigated whether the gut micro- analyses of associations between microbiome and immune system, and
biota could influence day-by-day changes in peripheral immune cell their comparison with immunomodulators such as granulocyte-colony
counts. We collected a vast dataset of immune-reconstitution dynam- stimulating factor (GCSF).
ics after allogeneic haematopoietic cell transplantation (HCT) from To detect a directional and causal link between the microbiota and cir-
individuals treated at Memorial Sloan Kettering Cancer Centre (MSK) culatory WBCs, we first used data from a randomized trial of autologous
Institute for Computational Medicine, NYU Langone Health, New York, NY, USA. 2Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer
1
Center, New York, NY, USA. 3Adult Bone Marrow Transplantation Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 4Weill Cornell Medical
College, New York, NY, USA. 5Infectious Disease Service, Department of Medicine, and Immunology Program, Sloan Kettering Institute, New York, NY, USA. 6Harvard University, T. H. Chan
School of Public Health, Boston, MA, USA. 7Sahlgrenska Cancer Center, Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg,
Sweden. 8Department of Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 9Division of Hematologic Malignancies and Cellular Therapy, Duke University
School of Medicine, Durham, NC, USA. ✉e-mail: jonas.schluter@nyulangone.org; xavierj@mskcc.org

Article
a I II III HCT phase b Neutrophil
Patient 1 (PBSC) c Neutrophil
Patient 2 (cord)
engraftment engraftment
20 Mean (n = 2,335)
(×1,000 μl–1)
15 Neutrophil counts
GCSF GCSF
10
5
0
3
(×1,000 μl–1)
2 Lymphocyte counts
0
(×1,000 μl–1)
3 Monocyte counts
1.5
0
0 20 40 56 0 20 40 56 0 20 40 56
Time after HCT (d) Time after HCT (d) Time after HCT (d)
d e f
Mean (n = 1,294)
Diversity
15
10
5
g 1
Mean (n = 1,294) h i
abundance
Relative
0.5 0.5 0.5
0 0 0
0 20 40 56 0 20 40 56 0 20 40 56
Time after HCT (d) Time after HCT (d) Time after HCT (d)
Akkermansiaceae Enterobacteriaceae Lactobacillaceae Actinomycetaceae
Lachnospiraceae Bacteroidaceae Enterococcaceae Peptostreptococcacea Streptococcaceae
Enterococcaceae Bifidobacteriaceae Erysipelotrichaceae Ruminococcaceae Veillonellaceae
Ruminococcaceae
Clostridiaceae 1 Lachnospiraceae Staphylococcaceae Other families
Fig. 1 | Immune reconstitution and microbiome dynamics after HCT. receiving transplants from the same source. In a, coloured bars on the left
a–c, Major phases of HCT: immunoablation during conditioning before HCT on indicate the range of cell counts in healthy individuals. d–f, Loss of microbial
day 0 (I) is followed by post-HCT neutropenia (II) and reconstitution (III). Daily diversity during HCT, measured by 16S rRNA gene sequencing of faecal
mean counts (shaded area indicates s.d.) of neutrophils, lymphocytes and samples, supporting previous smaller studies23,24. In d, the line shows daily
monocytes from individuals receiving transplants between 2003 and 2019 (a), mean across patients, shaded area shows s.d. e, f, Data from individual patients.
compared with those from individuals (b, c) representative of the recovery g–i, Relative abundance of commensal bacteria families. g, Mean (± s.d.)
trajectories for different stem-cell graft sources. Patient 1 received a PBSC relative abundance across all patients. h, i, Relative abundance in individual
graft and patient 2 received umbilical-cord blood. Line with circles shows data patients.
from the patient; solid line and shaded region show mean ± s.d. for all patients
faecal microbiota transplantation (auto-FMT)—a controlled micro- blood de novo by differentiation of haematopoietic progenitor cells
biota manipulation experiment performed directly on our patients20 from the bone marrow, and can be mobilized from thymus and lymph
(Extended Data Fig. 2a). To investigate whether auto-FMT affected WBC nodes (lymphocytes), and spleen, liver and lungs (neutrophils); WBCs
reconstitution, we compared the neutrophil, lymphocyte and mono- can also migrate from the blood to other tissues when needed23. To iden-
cyte counts after neutrophil engraftment in 24 individuals (engraftment tify modulators of these dynamic processes, we developed a two-stage
defined as 3 consecutive days with over 500 neutrophils per μl). FMTs approach analysing the changes of WBC counts between two days
were conducted at variable dates relative to neutrophil engraftment (Fig. 3a). Stage 1 served as a clinical- and metadata-feature-selection
(Fig. 2a, Supplementary Table 2). Overall, we observed higher counts stage using blood and medication data of 1,096 patients without avail-
of each WBC type in individuals who received an auto-FMT during the able microbiome information (Extended Data Fig. 1b shows data inclu-
first 100 days after neutrophil engraftment (P < 0.001, Fig. 2b, c; total sion). Stage 2 was performed on data from an independent cohort of 841
WBCs, Extended Data Fig. 2b–g). different patients from whom concurrent microbiome samples were
The higher WBCs in individuals receiving auto-FMT could result from available to detect associations between microbiome and peripheral
the successful reconstitution of a complex microbiota20 and associated immune cell dynamics.
metabolic capabilities21, or they could be a systemic response to a severe In stage 1, we analysed the changes in neutrophils, lymphocytes
therapy that introduced billions of intestinal organisms at once via an and monocytes during recovery from more than 20,000 pairs of
enema (no enema was administered to controls20). Moreover, chance post-engraftment blood samples separated by a single day (Fig. 3b).
differences in extrinsic factors such as different immunomodulator A cross-validated feature-selection approach detected medications
medications may have affected this result owing to the small cohort and HCT parameters associated with WBC dynamics (Extended Data
size. Nonetheless, the results supported the notion that the microbiota Fig. 3a–c, Supplementary Table 3). In stage 2, we sought to identify the
can modulate the peripheral immune system. High lymphocyte counts additional contribution of the gut microbiome. We performed Bayes-
during immune reconstitution have been associated with improved ian inferences using data from different sets of patients with available
clinical outcomes22, and 3-year survival was positively associated microbiome samples (Supplementary Table 4). Stage 1 had identified—
with higher mean levels of WBCs during the 100 days after neutro- as expected—that the sources of stem-cell grafts are associated with
phil engraftment in the individuals receiving HCT (hazard ratio = 0.91, immune reconstitution kinetics (for example, umbilical-cord blood
P = 0.04). Identifying the taxa that modulate immune dynamics could is associated with slower kinetics than peripheral blood24 (peripheral
therefore open new ways to improve immune reconstitution—critical blood stem cells (PBSCs)), and we therefore stratified our patients
for clinical outcomes. by graft source in stage 2. The dynamic systems model of stage 2
To investigate the links between the gut microbiota and the dynam- thus included bacterial genera as predictors of daily changes in WBC
ics of WBC recovery, we turned to our large observational cohort of counts, in addition to the medications selected in stage 1, clinical fea-
patients receiving HCT. Homeostasis of circulatory WBC counts is a tures (conditioning intensity, age and sex), and the current state of the
complex, dynamic process: WBCs are formed and released into the blood in the form of counts of neutrophils, lymphocytes, monocytes,

a 10
Neutrophils (×1,000 μl–1)
1.5
Lymphocytes (×1,000 μl–1)
3
Monocytes (×1,000 μl–1)
a GCSF-associated increase of +43% (HPDI95 (+30%, +58%), v-score = 3)
0.05 0.05 0.05
in monocyte rates and a smaller increase in lymphocyte rates (+16%,
HPDI95 (+5%, +27%), v-score = 3). Neutrophil and lymphocyte rates
decreased following antihistamine or immunosuppressive medication
Control
(cetirizine, −18%, HPDI95 (−35%, +5%); mycophenolate mofetil, −8%,

HPDI95 (−15%, +1%)). Finally, less intensive chemotherapeutic condi-
tioning regimens (non-ablative and reduced intensity) were associated
with increased lymphocyte and monocyte rates (Extended Data Fig. 4c).
Beyond the mechanistically plausible associations of medications,
our analysis detected associations between the current count of WBCs
FMT-treated
and their rates of change: negative associations of neutrophil and lym-

phocyte counts with the rates of monocytes, and negative associa-
tions of platelet and lymphocyte counts with the rates of lymphocytes
and neutrophils (Fig. 3e). Conversely, we found positive associations
between monocytes and the rates of each of the investigated WBC
0 100 0 100 0 100
subsets. These associations, derived from daily counts of WBCs, could
Days relative to neutrophil engraftment
reflect a complex network underlying the regulation of blood immune
b Neutrophils Lymphocytes Monocytes cell composition23. More importantly, the associations quantified for
10 10 10
FMT these potential homeostatic feedbacks and medications provided a
Control
benchmark against which we could compare gut microbial taxa.
μl–1)
(×1,000 μl–1)
(×1,000 μl–1)
5 5 5 We identified bacterial genera that consistently associated with WBC

(×1,000
dynamics (Fig. 3f). Higher relative abundances of Faecalibacterium

0.05 0.05 0.05 (+8%, HPDI95 (+1%, +14%) per unit log10 difference), Ruminococcus
0 5 10 0 5 10 0 5 10
Weeks relative to FMT randomization date 2 (+5%, HPDI95 (0%, +10%) per log10 difference) and Akkermansia (+4%,
c HPDI95 (+1%, +7%) per log10 difference) were associated with increased
Neutrophils Lymphocytes Monocytes
neutrophil rates, whereas Rothia (−3%, HPDI95 (−7%, 0%) per log10 dif-
Epost-FMT
Eintercept
ference) and Clostridium sensu stricto 1 (−3%, HPDI95 (−6%, 0%) per log10
difference) associated with reduced neutrophil rates. These results
0 2.5 0 0.38 0 0.5
(×1,000 μl–1) (×1,000 μl–1) (×1,000 μl–1)
were validated in univariate analyses of the Duke cohort (Extended
Data Fig. 4g–i). We also used total bacterial abundances as predictors
Fig. 2 | Neutrophil, lymphocyte and monocyte counts increased in instead of relative abundances; this confirmed Faecalibacterium as
FMT-treated individuals in the weeks following treatment. a, Absolute most strongly associated with neutrophil dynamics (Extended Data
counts of neutrophils, lymphocytes and monocytes in 10 control and 14
Fig. 5). Ruminococcus 2 (+5%, HPDI95 (+1%, +9%) per log10 difference)
FMT-treated individuals after neutrophil engraftment. Vertical lines indicate
and Staphylococcus (+4%, HPDI95 (+1%, +6%) per log10 difference) were
randomization dates. b, Weekly mean cell counts aligned to the randomization
positively associated with lymphocyte rates. Faecalibacterium and
date. Line shows weekly mean, shaded region shows 95% CI. c, Coefficient
Ruminococcus 2 also associated with increases in monocyte rates; this
estimates from linear mixed-effects models of neutrophils, lymphocytes and
monocytes over time indicate an increase of each WBC type induced by association was validated in other cohorts (v-score 3 and 1, respec-
auto-FMT (corresponding coefficient, βFMT: P = 4 × 10 −11 (neutrophils), tively), but there was higher uncertainty of the association estimate
P = 2 × 10 −10 (lymphocytes) and P = 2 × 10 −16, (monocytes); full regression results (HPDI50 > 0). Clostridium sensu strictu 1 (−3%, HPDI95 (−5%, −1%) per
are presented in the Supplementary Information). In b, c, N = 24 subjects, log10 difference) associated with decreased rates of monocytes. The
n = 921 blood samples. associations we identified—and validated in other cohorts—between
gut microbial taxa and daily changes in WBCs support the idea that
haematopoiesis and mobilization respond to the composition of the
eosinophils and platelets (Fig. 3a). The dataset comprised 841 individu- gut microbiome, influencing systemic immunity26.
als, but approximately 60% of the stool samples paired with a daily Intestinal bacteria may affect circulatory WBC counts by influencing
change in WBC counts were taken before neutrophil engraftment, either their sources in the bone marrow (or their cytokine profiles27 and
that is, when WBC counts were zero. Nevertheless, more than 2,000 proliferation rates in the blood), their sinks in different organs, or both.
post-engraftment observations of WBC changes during immune recon- The immune system in turn can interact with the microbiota and modu-
stitution provided a large sample for analysis of dynamics (Fig. 3b). late its composition, for example, via immunoglobulin A, as studied in
Stage 2 focused on data from the largest (Fig. 3c) cohort: PBSC graft mice28–30. To investigate the effect of the immune system on bacterial
recipients. We withheld the other cohorts (bone marrow (BM); T cell populations, we used an analogous approach to the stage 1 analysis.
depleted ex vivo by CD34+ selection grafts (TCD); and umbilical cord Dynamics of WBCs could be estimated from changes in absolute cell
(cord)) for validation scoring, and included data from patients treated counts, and to obtain the necessary absolute bacterial counts, we meas-
at Duke University for additional validation (Supplementary Table 5). ured total bacterial 16S rRNA gene copies per gram of stool for a subset
Notably, as a verification of our approach, we detected associa- of our samples (3,995 samples from 481 subjects) to jointly infer the
tions between immunomodulator administrations and consequent bidirectional association network between microbiota and the periph-
immune cell dynamics that were consistent with known biological eral immune system dynamics. All of our subjects received antibiotics
mechanisms (Fig. 3c, Extended Data Fig. 4a–f). The strongest asso- at some point during their treatment18, and their strong influence on
ciation across all predictors is the well-known neutrophil-increasing microbiota dynamics were the dominant effects that survived feature
effect of GCSF25; GCSF administration—used to accelerate recovery selection (Extended Data Fig. 6). However, relaxing the regulariza-
from chemotherapy-induced neutropenia25—was associated with a tion strength (Methods) revealed several bidirectional relationships
+140% increase in the rate of neutrophil changes from one day to the between WBCs and gut bacterial dynamics (Extended Data Fig. 7). Of
next (95% highest posterior density interval (HPDI95) (+114%, +170%)). note, we detected a negative association of absolute [Ruminococcus]
This finding was observed in all MSK (v-score = 3, Fig. 3d) and Duke gnavus group abundance with lymphocytes rates, confirming our main
validation datasets (Extended Data Fig. 4g–j). Furthermore, we found result based on relative bacterial abundances (Fig. 3f). In the reverse

Article
a Medications and treatments b Δt = 1 day, WBC dynamic data c Fig. 3 | Bayesian inference reveals associations between the microbiota and
Host state Neutrophils
With microbiota dynamics of circulatory WBC counts. a, Cartoon of the model: observed
Lymphocytes
Monocytes
changes in WBC counts between two consecutive days are associated with the
1,000 Validation
daily change current state of the host in the form of blood cell counts in circulation,
samples
No. of
log(Wt+1) – log(Wt ) 500
PC 2 (EV: 23%)
immunomodulatory medications, clinical metadata and the state of the
W: 0
neutrophils 300 microbiome. b, Visualization of the WBC dynamics data. Scatter plot of the
recipients
lymphocytes
No. of
monocytes 150 principal components (PC) of observed daily changes of neutrophils,
0 lymphocytes and monocytes without (grey; n = 20,751 (after neutrophil
PBSC
BM
TCD
cord
Microbiome engraftment), 81,253 (total)) and with (orange; n = 2,615 (after neutrophil
PC 1 (EV: 48%) Graft type
engraftment), 6,297 (total)) available concurrent microbiota samples. EV,
d V-score V-score V-score e V-score V-score V-score
explained variance. c, Recipients of PBSCs (N = 312) provided most paired
tes ** ** **
cy blood dynamics and microbiota samples (n = 995). Datasets from recipients of
GCSF *** *** *** no ils
Mo oph stem cells from TCD, bone marrow (BM) and umbilical cord (cord) grafts were
MM sin ils **
* * Eo oph used for validation. d–f, Bayesian inference results from PBSC graft recipient
utr tes ** * **
Cetirizine Ne ocy
h
mp ele
ts * data. d, Posterior coefficient distributions of associations between treatments
Ly Plat
–1 0 1 –1 0 1 –1 0 1 –0.3 0.0 0.5 –0.3 0 0.25 –0.3 0 0.25 (colour shows more than 95% posterior density (PD) of coefficient estimates
Effect on: ΔNeutrophils ΔLymphocytes ΔMonocytes ΔNeutrophils ΔLymphocytes ΔMonocytes
Posteriors
greater than zero (red) or less than zero (blue)). MM, mycophenolate mofetil.
e, WBC counts. f, Relative abundances of microbial genera and daily changes
50%
V-score V-score V-score

Mean f (Δ) in neutrophils, lymphocytes and monocytes. The v-score is the number of
95% HPDI
>95% prob.<0
Faecalibacterium validation cohorts confirming associations; it is set to zero if invalidated by
>95% prob.>0
Ruminococcus 2 validation cohorts (additional coefficients in Extended Data Fig. 4a–c). Unid.,
Akkermansia unidentified. g, One hundred microbiota samples with highest (left) or lowest
3 (right) relative abundance of Faecalibacterium, Ruminococcus 2 and
V-score
Unid. Lactobacillales 635

2
1
0 Veillonella Akkermansia. h, Simulation of neutrophil dynamics in the presence of GCSF
Bacteroides and microbiota compositions sampled from those with high (blue) or low (red)
[Clostridium] innocuum group relative abundance of Faecalibacterium, Ruminococcus 2 and Akkermansia as
Staphylococcus
shown in g. Lines show medians of 1,000 simulations and shaded regions show
Parabacteroides
the interquartile range of simulated trajectories. i, Time until neutrophil
[R.] gnavus group
counts first reach a density of 2,000 cells per μl in equivalent simulations
without GCSF.
Clostridium sensu stricto 1
Rothia
Faecalitalea
Ruminococcus 2 and Akkermansia that we associated with increased
–0.1
Effect on:
0.0 0.1
ΔNeutrophils
–0.1 0.0 0.1
ΔLymphocytes
–0.1 0.0
ΔMonocytes
0.1
WBC rates were also among those best reconstituted by auto-FMT20,
g h i potentially explaining the higher counts of neutrophils, monocytes
Faecalibacterium
Ruminococcus 2 25 + GCSF
4
– GCSF and lymphocytes in auto-FMT-treated individuals.
Simulated neutrophil count
Akkermansia
0.6 20 Our analyses show that the gut microbiome is associated with
Probability (%)
Relative abundance
immune cell dynamics in humans. The inferred associations should

(×1,000 μl–1)
15
100 highest 100 lowest 2 be interpreted as net effects, since they do not, for example, distin-
10
>15 d guish the effect of the microbiota on de novo haematopoiesis from its
0.1
5 effect on other sources and sinks. Unlike the plausible role of obligate
0 0 anaerobe fermenters in augmenting haematopoiesis via nutritional
Sample index Sample index 0 1 2 3 0 5 10 15
Simulated days Time to 2,000 support21, the positive association detected between Staphylococcus
neutrophils per μl (d)
and lymphocyte dynamics could instead result from reduced extrava-
sation of T cells from circulation into the gut epithelium40, especially
since high abundances of Staphylococcus are associated with low gut
direction, we saw a positive association of lymphocyte counts with [R.] microbiota diversity (P < 0.001, Extended Data Fig. 9a), which indicates
gnavus group growth rates. Ruminococcus gnavus is associated with a depleted microbiota.
inflammatory bowel diseases31 and autoimmune disorders10; our analy- Nevertheless, our approach enables us to leverage the chronology
sis suggests that it may drive high neutrophil-to-lymphocyte ratios that of events and assess ‘mathematical causality’41. Owing to the observa-
are broadly characteristic of poor disease outcomes in inflammatory tional nature of these data, there are risks of confounding, for exam-
bowel diseases32 and other conditions33,34. ple, from undetected infections or dietary components, which could
Several of the bacterial taxa positively associated with WBC rates explain some of the associations, but the close temporal correspond-
were obligate anaerobes, some of which produce cell-wall molecules1,35 ence41 between microbiota and WBC dynamics reduces the number of
and short-chain fatty acids36 that modulate immune responses and plausible confounders. Notwithstanding potential confounders, our
granulopoiesis37. Ruminococcus 2, for example, contains keystone results suggest candidate bacterial taxa that might improve immune
species that release nutrients from complex dietary starch38, and such reconstitution, and focused follow-up studies are required to evaluate
nutritional support from the microbiota improved haematopoietic their immunomodulatory efficacy. Members of Faecalibacterium,
reconstitution in mice21. To identify a similar association in our patients, Ruminococcus12 in one study, and Akkermansia11 in another have been
we estimated the microbiota reconstitution potency of each sample associated with better responses to anti–PD-1 immunotherapy, which
(Methods). Shotgun metagenomic sequences from 124 of our sam- suggested a disagreement regarding involved taxa42. Our results, how-
ples revealed that samples with positive microbiota potency were ever, identified Faecalibacterium, Ruminococcus 2 and Akkermansia as
enriched in cholate degradation, vitamin B1 synthesis and butanoate the taxa with the strongest associations with immune cell dynamics,
formation pathways (Extended Data Fig. 8). In line with evolutionary agreeing with the findings of both these previous studies that these
theory39, our results suggest that such broadly available microbial taxa are associated with human immune modulation. Furthermore,
traits may be co-opted by the host as part of the homeostatic interplay our work enables us to directly compare their inferred effect sizes with
between immune system and microbiota. The genera Faecalibacterium, the effects of immunomodulatory drugs. These genera are common in

healthy people43, but their abundance can fall below the detection limit 9. Markey, K. A. et al. The microbe-derived short-chain fatty acids butyrate and propionate
in patients after HCT18. Realistic ranges of 3–5 orders of magnitude in are associated with protection from chronic GVHD. Blood 136, 130–136 (2020).
10. Azzouz, D. et al. Lupus nephritis is linked to disease-activity associated expansions and
bacterial relative abundances (Fig. 3g, Extended Data Fig. 9b, c) could immunity to a gut commensal. Ann. Rheum. Dis. 78, 947–956 (2019).
yield effect sizes similar to the homeostatic feedbacks inferred between 11. Routy, B. et al. Gut microbiome influences efficacy of PD-1-based immunotherapy against
WBCs and immunomodulatory medications (for example, a change in epithelial tumors. Science 359, 91–97 (2018).
12. Gopalakrishnan, V. et al. Gut microbiome modulates response to anti-PD-1
Ruminococcus 2 from below the detection limit to 1% relative abundance immunotherapy in melanoma patients. Science 359, 97–103 (2018).
was associated with a +67% change in neutrophils and a +63% change 13. Vétizou, M. et al. Anticancer immunotherapy by CTLA-4 blockade relies on the gut
microbiota. Science 350, 1079–1084 (2015).
in lymphocytes). The effect sizes of gut bacteria may initially appear
14. Matson, V. et al. The commensal microbiome is associated with anti-PD-1 efficacy in
small relative to those of immunomodulatory drugs, but their effect metastatic melanoma patients. Science 359, 104–108 (2018).
could be considerable, as they refer to changes in exponential rates of 15. Tanoue, T. et al. A defined commensal consortium elicits CD8 T cells and anti-cancer
immunity. Nature 565, 600–605 (2019).
WBCs and would therefore accumulate while those bacteria remain
16. Brandi, G. & Frega, G. Microbiota: overview and implication in immunotherapy-based
abundant. To demonstrate this accumulation over time, we simulated cancer treatments. Int. J. Mol. Sci. 20, 2699 (2019).
WBC dynamics using our posterior coefficient distributions (Meth- 17. Xin Yu, J., Hubbard-Lucey, V. M. & Tang, J. The global pipeline of cell therapies for cancer.
Nat. Rev. Drug Discov. 18, 821–822 (2019).
ods). We simulated 1,000 time series for microbiota compositions
18. Morjaria, S. et al. Antibiotic-induced shifts in fecal microbiota density and composition
chosen from the 100 samples highest or lowest in Faecalibacterium, during hematopoietic stem cell transplantation. Infect. Immun. 87, e00206-19 (2019).
Ruminococcus 2 and Akkermansia (Fig. 3g), in the presence (Fig. 3h) 19. Peled, J. U. et al. Microbiota as predictor of mortality in allogeneic hematopoietic-cell
transplantation. N. Engl. J. Med. 382, 822–834 (2020).
or absence (Fig. 3i) of GCSF administration. Simulations predict that
20. Taur, Y. et al. Reconstitution of the gut microbiota of antibiotic-treated patients by
microbiota enriched in these genera accelerate immune reconstitu- autologous fecal microbiota transplant. Sci. Transl. Med. 10, eaap9489 (2018).
tion, and reduce the time until neutrophils reach a level of more than 21. Staffas, A. et al. Nutritional support from the intestinal microbiota improves
hematopoietic reconstitution after bone marrow transplantation in mice. Cell Host
2,000 μl−1 in the absence of GCSF by 2.4 days, from a predicted 6.8
Microbe 23, 447–457. (2018).
days (95% confidence interval (CI) (6.5, 7)) to 4.4 days (95% CI (4.3, 4.5)) 22. Savani, B. N. et al. Absolute lymphocyte count on day 30 is a surrogate for robust
days. Gut bacteria, together and over time, could therefore influence hematopoietic recovery and strongly predicts outcome after T cell-depleted allogeneic
stem cell transplantation. Biol. Blood Marrow Transplant. 13, 1216–1223 (2007).
steady-state immune homeostasis considerably, even in individuals 23. Scheiermann, C., Frenette, P. S. & Hidalgo, A. Regulation of leucocyte homeostasis in the
with less severely injured microbiomes. circulation. Cardiovasc. Res. 107, 340–351 (2015).
In sum, our work links the human gut microbiota to the dynam- 24. Thompson, P. A. et al. Umbilical cord blood graft engineering: challenges and
opportunities. Bone Marrow Transplant. 50 (Suppl 2), S55–S62 (2015).
ics of the immune system via peripheral WBC dynamics. Our analy- 25. Gabrilove, J. L. et al. Effect of granulocyte colony-stimulating factor on neutropenia and
sis uses WBCs counted directly from human subjects, which is a associated morbidity due to chemotherapy for transitional-cell carcinoma of the
coarse-grained clinical analysis conducted at large scale, but it lacks urothelium. N. Engl. J. Med. 318, 1414–1422 (1988).
26. Belkaid, Y. & Hand, T. W. Role of the microbiota in immunity and inflammation. Cell 157,
details such as the subtypes of lymphocytes and other immune cells. 121–141 (2014).
Because our study is in humans, it fills an important gap at a critical 27. Schirmer, M. et al. Linking the human gut microbiome to inflammatory cytokine
time for microbiome research, when the clinical relevance of animal production capacity. Cell 167, 1125–1136 (2016).
28. McLoughlin, K., Schluter, J., Rakoff-Nahoum, S., Smith, A. L. & Foster, K. R. Host selection
models of microbiome-immune interaction has been questioned44. of microbiota via differential adhesion. Cell Host Microbe 19, 550–559 (2016).
By studying a large number of patients over time, we could infer and 29. Hooper, L. V., Littman, D. R. & Macpherson, A. J. Interactions between the microbiota and
quantify the association between gut bacteria and systemic immune the immune system. Science 336, 1268–1273 (2012).
30. Palm, N. W. et al. Immunoglobulin A coating identifies colitogenic bacteria in
cell dynamics, and our results help to consolidate previous apparently inflammatory bowel disease. Cell 158, 1000–1010 (2014).
contradictory findings11,12,42. Our demonstration that the microbiota 31. Henke, M. T. et al. Ruminococcus gnavus, a member of the human gut microbiome
influences systemic immunity in humans opens the door towards an associated with Crohn’s disease, produces an inflammatory polysaccharide. Proc. Natl
Acad. Sci. USA 116, 12672–12677 (2019).
exploration of potential microbiota-targeted interventions to improve 32. Okba, A. M. et al. Neutrophil/lymphocyte ratio and lymphocyte/monocyte ratio in
immunotherapy and treatments for immune-mediated and inflamma- ulcerative colitis as non-invasive biomarkers of disease activity and severity. Auto Immun.
tory diseases8,10–12. Highlights 10, 4 (2019).
33. Choi, S.-J. et al. High neutrophil-to-lymphocyte ratio predicts short survival duration in
amyotrophic lateral sclerosis. Sci. Rep. 10, 428 (2020).
34. Gao, Y. et al. Neutrophil/lymphocyte ratio is a more sensitive systemic inflammatory
Online content response biomarker than platelet/lymphocyte ratio in the prognosis evaluation of
unresectable pancreatic cancer. Oncotarget 8, 88835–88844 (2017).
Any methods, additional references, Nature Research reporting sum- 35. Hergott, C. B. et al. Peptidoglycan from the gut microbiota governs the lifespan of
maries, source data, extended data, supplementary information, circulating phagocytes at homeostasis. Blood 127, 2460–2471 (2016).
acknowledgements, peer review information; details of author con- 36. Smith, P. M. et al. The microbial metabolites, short-chain fatty acids, regulate colonic Treg
cell homeostasis. Science 341, 569–573 (2013).
tributions and competing interests; and statements of data and code 37. Balmer, M. L. et al. Microbiota-derived compounds drive steady-state granulopoiesis via
availability are available at https://doi.org/10.1038/s41586-020-2971-8. MyD88/TICAM signaling. J. Immunol. 193, 5273–5283 (2014).
38. Ze, X., Duncan, S. H., Louis, P. & Flint, H. J. Ruminococcus bromii is a keystone species for
the degradation of resistant starch in the human colon. ISME J. 6, 1535–1543 (2012).
1. Mazmanian, S. K., Liu, C. H., Tzianabos, A. O. & Kasper, D. L. An immunomodulatory 39. Foster, K. R., Schluter, J., Coyte, K. Z. & Rakoff-Nahoum, S. The evolution of the host
molecule of symbiotic bacteria directs maturation of the host immune system. Cell 122, microbiome as an ecosystem on a leash. Nature 548, 43–51 (2017).
107–118 (2005). 40. Fu, Y.-Y. et al. T cell recruitment to the intestinal stem cell compartment drives
2. Gomez de Agüero, M. et al. The maternal microbiota drives early postnatal innate immune-mediated intestinal damage after allogeneic transplantation. Immunity 51,
immune development. Science 351, 1296–1302 (2016). 90–103 (2019).
3. Olin, A. et al. Stereotypic immune system development in newborn children. Cell 174, 41. Gerber, G. K. The dynamic microbiome. FEBS Lett. 588, 4131–4139 (2014).
1277–1292 (2018). 42. Jobin, C. Precision medicine using microbiota. Science 359, 32–34 (2018).
4. Tan, T. G. et al. Identifying species of symbiont bacteria from the human gut that, alone, 43. The Integrative HMP (iHMP) Research Network Consortium. The integrative human
can induce intestinal Th17 cells in mice. Proc. Natl Acad. Sci. USA 113, E8141–E8150 (2016). microbiome project. Nature 569, 641–648 (2019).
5. Deshmukh, H. S. et al. The microbiota regulates neutrophil homeostasis and host 44. Walter, J., Armet, A. M., Finlay, B. B. & Shanahan, F. Establishing or exaggerating causality
resistance to Escherichia coli K1 sepsis in neonatal mice. Nat. Med. 20, 524–530 (2014). for the gut microbiome: lessons from human microbiota-associated rodents. Cell 180,
6. Ivanov, I. I. et al. Specific microbiota direct the differentiation of IL-17-producing T-helper 221–232 (2020).
cells in the mucosa of the small intestine. Cell Host Microbe 4, 337–349 (2008).
7. Geva-Zatorsky, N. et al. Mining the human gut microbiota for immunomodulatory Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
organisms. Cell 168, 928–943 (2017). published maps and institutional affiliations.
8. Lloyd-Price, J. et al. Multi-omics of the gut microbial ecosystem in inflammatory bowel
diseases. Nature 569, 655–662 (2019). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods (Invitrogen) containing 1 copy of the 16 s rRNA gene. Cycling conditions
were 95 °C for 10 min followed by 40 cycles of 95 °C for 30 s, 52 °C for
No statistical methods were used to predetermine sample size. The 30 s and 72 °C for 1 min. We used the measurements of total 16S rRNA
experiments were not randomized, except for the auto-FMT trial as gene counts per gram of stool to multiply the relative abundances of
explained in NCT02269150. The investigators were not blinded to allo- taxa obtained from 16S amplicon sequencing to obtain the estimate
cation during experiments and outcome assessment. of their total abundance per gram of stool (supplementary informa-
tion). Of note, this does not account for 16S copy-number variation
Ethics approval and informed consent between taxa, but the observed dynamic ranges in total abundances
The participants in the auto-FMT trial (NCT02269150) provided written of taxa in our dataset span up to nine orders of magnitude, exceeding
informed consent to participate in the trial (#14-025). Participants in the potential inaccuracies due to copy-number variation.
the observational cohorts at both MSK and at Duke provided written
informed consent for the use of their faecal specimens and clinical Diversity calculations
data. The use and analysis of these specimens for the work herein was Microbiome alpha-diversity was measured by the inverse Simpson
approved by Institutional Research Boards at both institutions: MSK (IS) index of a sample. It was calculated by ISi = N 1 2 , where p is the
∑ j =1 pij
(#16-834) and Duke (PRO0006268 and Pro00050975).
relative abundance of the jth ASV out of N total ASVs in sample i.
Complete blood count collection and characterization
Absolute WBC count data were obtained from routine complete blood Linear mixed-effects model of WBC counts
counts ordered by clinicians during normal clinical practice, used to To study the effect of auto-FMT on WBCs, we investigated the WBC
obtain informative diagnostic and monitoring information. Blood counts of 24 subjects enrolled in this trial from the day of neutrophil
samples received in the clinical haematology laboratory were analysed engraftment until 100 days after engraftment. FMT was performed on
using Sysmex XN automated haematology analysers (Sysmex) and, different days relative to neutrophil engraftment. Thus, we performed
when needed based on specific flags and parameters as per MSKCC an analogous analysis to that conducted in the original publication
standard operating procedures, were validated manually using the that demonstrated how FMT re-established a diverse microbiome in
Sysmex DI-60 Slide Processing System or CellaVision DM9600 Auto- the post-FMT period20. To determine whether WBC counts differed
mated Digital Morphology System (Sysmex). after FMT, we used a linear mixed-effects model of WBC counts, y,
modelled as a function of the FMT treatment as well as patient- and
16S rRNA gene amplification and multiparallel sequencing time-point-specific random effects. We included random intercept
For each sample, duplicate 50-μl PCRs were performed, each containing terms for each day i and each patient j, and a fixed-effects term for
50 ng purified DNA, 0.2 mM deoxynucleotide triphosphates, 1.5 mM the post-FMT period with associated coefficient ‘armpost’, using the
MgCl2, 2.5 U Platinum Taq DNA polymerase, 2.5 μl of 10× PCR buffer, indicator variable FMT, which has a value of 1 when a patient was from
and 0.5 μM of each primer designed to amplify the V4–V5 region: the FMT-treated arm of the trial and day was greater than or equal to
563F (5′-nnnnnnnnNNNNNNNNNNNNAYTGGGYDTAAAGNG-3′) and the day of the FMT procedure. We conducted independent analyses
926R (5′-nnnnnnnnNNNNNNNNNNNNCCGTCAATTYHTTTRAGT-3′ for neutrophil, lymphocyte and monocyte counts. This resulted in the
). A unique 12-base Golay barcode (Ns) precedes the primers for sam- following model of a cell count, y, for patient j on day i:
ple identification45, and one to eight additional nucleotides (n) were
placed in front of the barcode to offset the sequencing of the primers. yij = β0 + armpost × FMTij + dayi + patientj + εij , i = 0, …, D, j = 1, …, P
Cycling conditions were 94 °C for 3 min, followed by 27 cycles of 94 °C
2 2
for 50 s, 51 °C for 30 s, and 72 °C for 1 min. For the final elongation step, with prior distributions dayi ~ N(0, σ day), and patientj ~ N(0, σ patient),
72 °C for 5 min was used. Replicate PCRs were pooled, and amplicons independent error εij ~ N(0, σ 2) and fixed intercept β0, for the D days
were purified using the QIAquick PCR Purification Kit (Qiagen). PCR after neutrophils engraftment and P individuals, (D = 100, P = 24). For
products were quantified and pooled at equimolar amounts before convenience of those interested in reanalysing our data, the part of
Illumina barcodes and adaptors were ligated, using the Illumina TruSeq our data concerning the auto-FMT analysis is available in tidy format
Sample Preparation protocol. The completed library was sequenced (Supplementary Information), and the analysis code conducted in
on an Illumina MiSeq platform following the Illumina recommended the R programming language is available as an exported notebook
procedures with a paired-end 250 × 250-bp kit (fmt_effect_on_wbc.pdf) on github: https://github.com/jsevo/
wbcdynamics_microbiome/ 49. We conducted an additional analysis
Sequence analysis with ‘day’ as a continuous predictor, which did not change our conclu-
The 16S (V4-V5) paired-end reads were merged and demultiplexed. sions (Supplementary Information).
Amplicon sequence variants (ASVs) were identified using the divisive
amplicon denoising algorithm (DADA2) pipeline including filtering Dynamic systems analyses
and trimming of the reads46. Reads were trimmed to the first 180 bp We analysed factors associated with the observed changes of absolute
or the first point with a quality score Q < 2. Reads were removed if they counts of neutrophils, lymphocytes and monocytes between two days.
contained ambiguous nucleotides (N) or if two or more errors were In the following we describe how chronology of events and biological
expected based on the quality of the trimmed read. We assigned tax- samples were encoded, and the models used to infer a role of medica-
onomy to ASVs using an octamer-based classifier trained by IDTaxa47 tions, clinical parameters and the microbiome on dynamics of WBCs.
using the SILVA database48. To reveal factors that associate with day-to-day changes in WBC
counts, we started from a first-order differential equation of WBC
Quantification of total microbiota density per gram of stool and (W) dynamics:
estimation of total genus abundances
qPCR was performed on DNA extracted from 1 g wet weight of a stool sam- d(W )  P 
ple using DyNAmo SYBR Green qPCR kit (Finnzymes) and 0.2 μM of the dt
= W gr +
 ∑ βj Xj 
 j =1 
universal bacterial primer 8F (5′-AGAGTTTGATCCTGGCTCAG-3′) and the
broad-range bacterial primer 338R (5′-TGCTGCCTCCCGTAGGAGT-3′). Where gr represents the intercept, that is, the base line rate of change
Standard curves were prepared by serial dilution of the PCR blunt vector during immune reconstitution, and βj are the to-be-estimated
coefficients of the P predictors Xj, j ∈ P of the WBC dynamics. This Stage 1 identified important differences between transplant types,
equation was then linearized to and we therefore stratified our data into four cohorts according to
the source of the stem-cell graft. Using data independently from each
P
d(ln W ) cohort, we applied ‘no u-turn’ sampling53 to produce 10,000 posterior
dt
= gr + ∑ βj Xj
j =1 samples from 5 independent MCMC chains that parameterized the
model:
And we parameterized the corresponding discrete difference
equation: y ~ N(μ, σ 2)
P
Δ ln(W ) Pˆ
Δt
= gr + ∑ βj Xj μ = gr + ∑ xj βj
j =1
j =1
where Δln(W) is the log-difference between single days of neutrophils,

lymphocytes or monocytes counts, and Δt = 1 for all intervals. Predictors with uninformative prior distributions
include the counts of neutrophils, lymphocytes, monocytes, eosino-
phils and platelets during an interval (homeostatic feedbacks), immu- gr ~ N(mean = 0, standard deviation = 100)
nomodulatory medication and clinical observations such as a blood
stream infection and the onset of graft versus host disease, HCT param- βj ~ N(mean = 0, standard deviation = 100)
eters such as graft types and conditioning regimens, and, additionally,
the microbiota composition in stage 2 of our analysis (Supplementary
σ ~ half Cauchy(beta = 2)
Information for data exclusion and additional details on interval defini-
tions). Importantly, by parameterizing a dynamic equation and analys- where y is the observed daily change of a focal WBC type as in stage 1
ing rates of change, our coefficient estimates have an immediate causal with normal distributed mean μ, and σ, the model uncertainty with a
interpretation within our modelling framework (that is, a βj >0 implies thick-tailed half Cauchy prior (importantly, our posterior estimates
that higher levels of the corresponding Xj increases the rate of change do not depend on this choice as we obtain the same results with an
of WBC type, W). To differentiate such results from other associations, inverse gamma prior, Extended Data Fig. 10b). μ was a function of the
they have been described by the term ‘mathematical causality’41. baseline growth rate gr, and predictors P̂: medications with non-zero
coefficients in stage 1, the WBC counts, patient age and sex, and HCT
Stage 1 analysis conditioning intensities; additionally, P̂ now included the
This includes feature selection: identifying medications and clinical log-abundances of microbial genera as measured by 16S sequencing
observations associated with WBC dynamics from patients without from DNA in the stool collected on the second day of a daily interval
microbiome data. Stage 1 uses data of patients without any available (see supplementary information for details). We considered taxa that
microbiome samples and the following model of WBC changes, y: were among the 100 most abundant, or had reached maximum relative
p
abundances of at least 10%, and selected those who were non-zero in
more than 75% of our samples. WBC counts and microbiota data present
y = gr + ∑ βjX j,
j=1 during a daily interval were log-transformed, and zeros were filled with
half of the minimum observed non-zero counts (that is, 0.5 × 103 and
with intercept, gr. The predictors, X, include dummy variables for the 2 × 10−6, respectively). We focused on the largest cohort (PBSC) and used
HCT graft type, patients’ age on the date of HCT, sex, 13 most frequently the independent inference results from TCD, bone marrow and cord
observed positive blood cultures with remaining other blood stream cohorts for validation.
infections grouped into a separate category ‘other infections’, an indi-
cator for the onset of graft versus host disease, administrations of 55 Validation score
different, most common immunomodulatory medications and platelet Coefficients learned from the PBSC cohort were assigned a valida-
transfusion events, and HCT conditioning intensity regimens as well tion score based on the results obtained from the other three MSK
as the log-transformed geometric mean counts of neutrophils, lym- patient cohorts. Our requirements for validation were conservative;
phocytes, monocytes, eosinophils and platelets during the respective we required evidence from our validation datasets as well as absence
interval. We used elastic net regression50 for feature selection using the of counter evidence. For regression results from each of the validation
sklearn package for the Python programming language51. For elastic net graft type cohorts, that is, TCD, bone marrow and cord, we checked if a
regression with 50% L1 penalty, predictors were scaled between zero coefficient had more than 75% probability (50%HPDI) to have the same
and 1, and we used tenfold cross validation (that is, leaving out 10% of sign as the mean of the PBSC coefficient posterior for a given predictor.
patients at each cross-validation step) to choose the regularization If so, this was considered evidence of validation, and we summed the
strength, λ, solving for evidence over the three validation sets (that is, maximum score of 3, 1
from each of TCD, bone marrow and cord cohorts). Conversely, if we
 N  p 
2
p p  found more than 75% probability among any of the validation datasets
 1 1 1 
argmingr , β  ∑ yi − gr − ∑ xijβj  + λ ∑ |βj | + 2 λ ∑ β j2 that a given predictor had the opposite sign as the posterior mean
 2N i =1  j=1 
2 j =1 j =1  calculated from PBSC data, this was considered counter evidence and
 
the validation score was always set to zero.
Stage 1 yielded a sparse coefficient matrix of predictors used to design
the model in stage 2. Analysis of WBC dynamics with absolute bacterial abundances
as predictors instead of relative abundances
Stage 2 We conducted an ordinary least-squares regression using the statsmod-
Stage 2 of the analysis comprises expanded analysis on patients els package in the Python programming language of the same model
with microbiome data. To identify associations between microbiota as in the main Bayesian analysis using total bacterial abundances as
and WBC dynamics, we conducted an analogous, Bayesian regression predictors. This was only possible on a subset of 389 neutrophil, 331
using the package PyMC3 for the Python programming language52. lymphocyte and 376 monocyte rate observations from PBSC patients.
Article
where η is the total number of observed daily log changes in genera and
Forwards simulation of predicted immune system WBCs, and ρ the total number of predictors. This yielded a strongly
reconstitution kinetics regularizing λs, and thus few predictors. To characterize potential bidi-
To assess the impact of the estimated microbiota coefficients on immune rectional relationships between WBC counts and the gut microbiota,
system dynamics, we conducted 1,000 simulations of the system of 3 dif- we iteratively reduced the regularization strength until the strongest
ferential equations describing the dynamics of neutrophils, lymphocytes interaction between microbiota and WBC dynamics, that is, Faecali-
and monocytes. We ran 1,000 simulations four times: in the presence and bacterium with neutrophil dynamics, was detected. We than re-ran the
absence of GCSF, each with microbiota compositions enriched or depleted regression with this reduced regularization strength λr.
in Faecalibacterium, Ruminococcus 2 and Akkermansia. To identify these
compositions, we ranked the observed microbiota compositions by these Shotgun sequencing
taxa, and chose randomly either from the top or bottom 100. The coef- Sequencing of 124 post-neutrophil engraftment was conducted on
ficients for WBC interactions, interactions with the microbiota and the the Illumina HiSeq platform. For details and the processing of the
effect of GCSF were sampled from our posterior coefficient distributions. FASTQ files, see supplementary information. We used the HUMAnN2
Using these coefficients sampled at the start of the simulation, and using pipeline54 with default settings for functional profiling of our sam-
50 cells μl−1 of neutrophils, lymphocytes and monocytes as initial values, ples, with the UniRef90 data base and ChocoPhlAn for alignment,
we simulated these differential equations forwards in time using the odeint and we renormalized our samples by library depth to copies per mil-
function of the scipy package for the Python programming language. lion. We used MetaCyc to obtain stratified and unstratified pathway
abundances.
Validation on data from Duke University
We analysed 9,603 blood samples with 25,581 associated administra- Statistical analysis of shotgun data
tions of immunomodulatory medications, and 741 microbiota sam- We calculated the predicted microbiota potency score for each sample
ples from Duke as an orthogonal dataset to validate our findings. The and separately for neutrophils, lymphocytes and monocytes, by multi-
temporal resolution of this data was much lower, and after filtering for plying the abundances of taxa in each of the 124 samples with the cor-
samples from the relevant post-neutrophil engraftment period, and by responding posterior coefficients obtained from the PBSC inference.
requiring daily intervals, 83 valid, complete data points were available. To distinguish the sets of metabolic functions that separate samples
Using these data, we correlated daily blood cell changes individually with positive and negative predicted potencies, we converted the path-
in univariate, or jointly in a partial least squares regression, with those way abundances into presence and absences profiles. We performed
predictors that achieved more than 95% probability density in the posi- a linear discriminant analysis between positive and negative potency
tive or negative domain in the PBSC data regression. For each of these samples with a least squares solver and automatic shrinkage using the
predictors, we present the sign of slopes and Bonferroni-corrected P Ledoit–Wolf lemma using the sklearn package for the Python program-
values from individual linear regressions. ming language51. To assess differences in the presence or absence of
pathways between samples with positive and negative potency, we
Joint analysis of the effect of antibiotics and WBC counts on used Fisher’s exact test for each pathway.
the microbiota and the microbiota and immunomodulatory
medications on WBC counts Reporting summary
Analogous to stage 1, we performed cross-validated, regularized linear Further information on research design is available in the Nature
regressions (ElasticNet) using the scikit-learn package for the Python Research Reporting Summary linked to this paper.
programming language to jointly estimate the association network
between microbiota and circulatory WBCs. For this, we constructed a
block matrix X of predictor matrices Xi that include the absolute bacte- Data availability
rial abundances, drug data (antibiotics for bacterial dynamics and All data supporting the findings of this study are available within the
immune modulators for WBC dynamics), as well as the counts of WBCs paper and its Supplementary Information files. The data used in our
and a separate intercept term per block. Each block X ln l , pl , with nl obser- study are organized in Excel-compatible comma-separated value files
vations and pl predictors (l = 0,...,k), on the diagonal of X corresponds as Supplementary Tables (data-tables.zip). All sequencing data have
to the indices of the observed daily log-changes of one of the 41 bacte- been made available publicly, and the NCBI SRA accession numbers are
rial genera considered in our main analysis or the log changes in neu- listed in the Supplementary Tables. Metadata and processed sequenc-
trophil, lymphocyte and monocyte counts from PBSC patients ing data are made available on a public repository via Figshare: meta
contained in Y (in total we calculated 15,833 rates from 256 patients). data, https://doi.org/10.6084/m9.figshare.12016986.v4; samples,
Our regression problem can thus be written as: https://doi.org/10.6084/m9.figshare.12016983.v4; 16S counts, https://
doi.org/10.6084/m9.figshare.12016989.v3; and 16S taxonomy, https://
X 0n 0, p0 ⋯ 0n 0, pk doi.org/10.6084/m9.figshare.12016992.v1.
argminβ (Y − Xβ) where X = ⋮ ⋱ ⋮
0n k , p0 ⋯ X nk k , pk Code availability
All of the steps of the analyses that were performed in this study
with k = 44, that is, 41 bacterial genera and 3 WBC types, the to-be estimated are described in detail to allow reproduction of the results. Rele-
coefficient vector β and 0 the zero matrix. This system is underdetermined vant analysis code is available publicly at https://github.com/jsevo/
and we therefore chose the same approach as in stage 1, elastic net regres- wbcdynamics_microbiome.
sion, for feature selection. Predictors were scaled between zero and 1, and
we used threefold cross validation, leaving out one-third of the patients 45. Caporaso, J. G. et al. Ultra-high-throughput microbial community analysis on the Illumina
HiSeq and MiSeq platforms. ISME J. 6, 1621–1624 (2012).
at each iteration to identify a global regularization strength λ, solving for 46. Callahan, B. J. et al. DADA2: high-resolution sample inference from Illumina amplicon
data. Nat. Methods 13, 581–583 (2016).
  
2  47. Murali, A., Bhargava, A. & Wright, E. S. IDTAXA: a novel approach for accurate taxonomic
1 1 1  classification of microbiome sequences. Microbiome 6, 140 (2018).
argminβ  ∑ yi − ∑ xij βj  + λ ∑ |βj | + 2
λ ∑ β 2j  48. Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data
2 i =1  j =1 
2 j =1 j =1 
  processing and web-based tools. Nucleic Acids Res. 41, D590–D596 (2013).
49. Pinheiro, J. C., Bates, D. M., DebRoy, S. S. & Sarkar, D. nlme: Linear and Nonlinear Mixed RO1 AI093808. The funders had no role in study design, data collection and analysis, decision
Effects Models. R package version 3.1-150 (2013). to publish or preparation of the manuscript.
50. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58,
267–288 (1996). Author contributions J.S. and J.B.X. wrote the manuscript. J.S. and J.B.X. designed the
51. Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825 analyses with expert help from R.N. J.U.P. and Y.T. contributed to the clinical data preparation,
(2011). B.P.T. provided the 16S data-processing pipelines, K.A.M., M.S., A.S., S.M., M.F., M.S.P., T.M.H.,
52. Salvatier, J., Wiecki, T. V. & Fonnesbeck, C. Probabilistic programming in Python using M.-A.P. and M.R.M.v.d.B. provided clinical context and helped with variable selection, N.J.C.,
PyMC3. PeerJ Comput. Sci. 2, e55 (2016). M.L., L.B., A.B. and A.D.S. provided clinical and other data from Duke, A.D. provided the
53. Hoffman, M. D. & Gelman, A. The No-U-turn sampler: adaptively setting path lengths in shotgun processing pipelines. E.F., L.A.A. and R.J.W. processed patients’ stool samples,
Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15, 1593–1623 (2014). including for 16S sequencing, shotgun metagenomics and qPCR quantification of total 16S
54. Franzosa, E. A. et al. Species-level functional profiling of metagenomes and rRNA gene. All authors contributed to the writing and interpretation of the results.
metatranscriptomes. Nat. Methods 15, 962–968 (2018).
Competing interests M.R.M.v.d.B. and J.U.P. received financial support from Seres
Therapeutics. M.-A.P. has received honoraria from AbbVie, Bellicum, Bristol-Myers Squibb,
Acknowledgements We thank M. Lipsitch, S. B. Andersen, K. R. Foster, J. K. Sia, E. G. Pamer, Incyte, Merck, Novartis, Nektar Therapeutics, and Takeda, research support for clinical trials
K. Coyte, S. Mitschka and the members of the Xavier lab for helpful discussion and comments from Incyte, Kite (Gilead) and Miltenyi Biotec, and serves on data and safety monitoring boards
on the manuscript. This work was supported by the National Institutes of Health (NIH) grants for Servier and Medigene and scientific advisory boards for MolMed and NexImmune.
U01 AI124275, R01 AI137269 and U54 CA209975 to JBX, by the MSKCC Cancer Center Core
Grant P30 CA008748, the Parker Institute for Cancer Immunotherapy at Memorial Sloan Additional information
Kettering Cancer Center, the Sawiris Foundation, the Society of Memorial Sloan Kettering Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
Cancer Center, MSKCC Cancer Systems Immunology Pilot Grant and Empire Clinical Research 2971-8.
Investigator Program. M.S. received funding from the Burroughs Wellcome Fund Postdoctoral Correspondence and requests for materials should be addressed to J.S. or J.B.X.
Enrichment Program, the Damon Runyon Physician-Scientist Award, and the Robert Wood Peer review information Nature thanks Henrik Nielsen and the other, anonymous, reviewer(s)
Johnson Foundation. T.M.H. is investigator in the Pathogenesis of Infectious Diseases from the for their contribution to the peer review of this work.
Burroughs Wellcome Fund, and funded via an award from Geoffrey Beene Foundation, and NIH Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Blood cell counts over time. a, WBC counts and platelet counts per graft source over the first 100 days post HCT per day relative to HCT
from N = 2,235 adult patients (detailed demographics in supplementary Table 1); lines: mean, shaded: ± standard deviations. b, Data exclusion diagram.
Extended Data Fig. 2 | FMT increases WBC counts. a, HCT patient who estimates (mean vs. mean + FMT effect) from linear mixed effects models of
received an autologous faecal microbiota transplant (auto-FMT, dashed red total WBC counts over time indicate an auto-FMT-induced increase of WBCs
line) that restored commensal microbial families and ecological diversity in (βFMT: P = 7 × 10 −14). e–g, Respectively: neutrophil, lymphocyte and monocyte
the gut microbiota, with concurrent cell counts of peripheral neutrophils, count trajectories of 24 FMT trial patients. Thin lines: raw data (blue:
lymphocytes and monocytes and immunomodulatory drug administrations. post-FMT); thick black: mean per day, thick blue: mean+post-FMT coefficient.
b, Total WBC counts in 24 enrolled patients (10 control, 14 treated) Means and confidence intervals (shaded region) without (black) and after FMT
post-neutrophil engraftment; vertical lines indicate randomization dates. (blue), as well as the coefficient estimate for FMT treatment and its P value from
c, Weekly mean WBC counts aligned to the randomization date (FMT-treated: a linear mixed effects model relating cell counts over time to the FMT
red, control: black). Line: mean per week, shaded region: 95% CI. d, Coefficient treatment (Methods).
Article
Extended Data Fig. 3 | Results of the feature selection stage 1 regression. gr: intercept; TCD: T cell depleted graft (ex-vivo) by CD34+ selection; PBSC:
a–c, Stage 1 regression on neutrophil, lymphocyte, and monocyte dynamics, peripheral blood stem cells; BM: bone marrow; cord: umbilical cord blood;
respectively, on patients without microbiome data. Coefficients from NONABL: Nonmyeloablative; REDUCE: reduced-intensity conditioning
tenfold cross-validated elastic net regression daily changes in neutrophils. regimen; F: female; N: patients, n: samples (daily changes in neutrophils).
Extended Data Fig. 4 | Additional coefficients, posterior convergence coefficients from individual univariate regressions of microbiome and clinical
evaluation and validation. a–c, Additional posterior coefficient estimates of predictors with changes in neutrophils, lymphocytes and monocyte, and for
medications, additional genera and HCT metadata from the Bayesian stage 2 comparison the corresponding coefficients signs from the Bayesian multiple
regression, see also Fig. 3. REDUCE: reduced-intensity conditioning regimen; linear regressions in stage 2 of the analysis of WBC dynamics in MSK patients
NONABL: non-myeloablative conditioning regimen. F: female. d–f, posterior (Fig. 3). Pvalues were adjusted for multiple hypothesis testing using Bonferroni
sampling convergence. Histograms of the ranked posterior draws from the correction: ***P < 0.001, **P < 0.01, *P < 0.05; P > 0.05: n.s. Sign of coefficients
model of neutrophil, lymphocyte and monocyte dynamics, respectively, in from MSK PBSC patients for comparison. j, Equivalent validation analysis from
PBSC patients (ranked over all chains), plotted separately for each chain show patients treated at Duke using partial least squares regression of microbiome
no substantial differences between chains. g–i, Predictors of WBC dynamics and clinical predictors identified in stage 2 of our analysis on daily changes in
using data from patients treated at Duke. Heatmaps indicate the slope neutrophils, lymphocytes and monocyte.
Article
Extended Data Fig. 5 | Validation using absolute instead of relative that is, neutrophil, lymphocyte and monocyte daily log-changes, was
abundance bacterial genus data. a–d, Validation analysis of the main model conducted, and coefficients for medications (a), WBC feedbacks (b)
using absolute bacterial abundances as predictors instead of relative metadata (c) and total genus abundances (d) are shown. This was only possible
abundances in Fig. 3. Results show inferred coefficients and P values from for only a subset of the data used in the main analysis for which we obtained
multiple linear regressions. One regression per analysed WBC type dynamics, absolute bacterial abundance estimates (Methods), n: samples, N: patients.
Extended Data Fig. 6 | Jointly inferred association network between WBC and bacterial genus dynamics. Strong regularization yields few non-zero
coefficients and antibiotics dominate the dynamics.
Article
Extended Data Fig. 7 | Jointly inferred association network between WBC for example, between lymphocytes and [Ruminococcus] gnavus group
and bacterial genus dynamics with reduced regularization. Reducing (highlighter green boxes, and cartoon).
regularization strength (Methods) indicates potential bidirectional feedbacks,
Extended Data Fig. 8 | Functional analysis of microbiota samples. To samples that distinguished positive and negative potency samples the most
distinguish samples predicted to increase rates of WBCs, a microbiota potency (LDA-score magnitude in the 95th percentile). Highlighted pathways are
score was calculated from posterior coefficients (Fig. 3, Methods) and the discussed in the main text. For each pathway, we tested whether pathway
relative abundance of taxa in samples. Bars show linear discriminant analysis presence was enriched (depleted) in positive (negative) potency samples using
(LDA) scores of MetaCyc pathway profiles from 124 shotgun sequenced one-sided Fisher’s exact test; ***P < 0.001, **P < 0.01, *P < 0.05.
Article
Extended Data Fig. 9 | Abundance profiles of bacterial genera across Staphylococcus abundances). b, Abundance profiles of the two genera,
analysed samples. a, The relative non-zero abundance of Staphylococcus is Faecalibacterium and Ruminococcus 2, most strongly associated with WBC
inversely related to microbiome alpha diversity, bold line: regression line from increase; number of times detected (left) and log10 abundance distribution
a linear model of the mean of the log10 Staphylococcus relative abundance, when above detection (right).
shaded: 95% confidence intervals (n = 1,381 samples with non-zero
Extended Data Fig. 10 | Survival analysis and confirmation of model results prior for σ in the main Bayesian model. Plotted are the posterior means from
with different priors. a, Kaplan–Meier plot of patient 3-year survival with our main analysis against the equivalent inference with an inverse Gamma prior
sufficient available blood data (Supplementary Information, Extended Data (alpha = 1, beta = 1).
Fig. 1). b, posterior association coefficients do not depend on the choice of
Corresponding author(s): Jonas Schluter, Joao B. Xavier
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection no software was used for data collection
Data analysis python3.7.0, R3.6.1, DADA2, Humann2, ChocoPhlAn, MetaCyc, PyMC3, sklearn-0.23.2, https://github.com/jsevo/wbcdynamics_microbiome/
Data
April 2020
The data used in our study is organized in Excel compatible comma separated value files as supplementary tables (data-tables.zip). All sequencing data have been
made available publicly, and the NCBI SRA accession numbers are listed in the tables below:
1. cGENUS.csv: relative taxon abundances in fecal microbiota samples from 12,633 stool samples
2. cHCTMETA.csv: HCT characteristics
3. cINFECTIONS.csv: positive blood culture results
4. cMISAMPLES.csv: NCBI SRA accession number, diversity (inverse Simpson index), total 16S (where available), stool consistency for each fecal microbiota sample
5. cMED.csv: medication data
1
6. cPIDMETA.csv: anonymized patient demographics
7. cWBC.csv: absolute counts of neutrophils, lymphocytes, monocytes, eosinophils, and platelets with indication if included in analyses

8. cDUKE__GENUS.csv: relative taxon abundances in fecal microbiota samples from 12,633 stool samples
9. cDUKE__WBC.csv: absolute counts of neutrophils, lymphocytes, monocytes, eosinophils, and platelets with indication if included in analyses
10. cDUKE__MED.csv: medication data
11. cFMT_analysis.csv: convenience table for Figure 2
Metadata and processed sequencing data are made available on a public repository via Figshare:
meta data: doi.org/10.6084/m9.figshare.12016986.v4

samples: doi.org/10.6084/m9.figshare.12016983.v4
16S counts: doi.org/10.6084/m9.figshare.12016989.v3
16S taxonomy: doi.org/10.6084/m9.figshare.12016992.v1

Sample size Throughout the manuscript, sample sizes are specified. There are many nested analyses on various subsets, and each analysis specifies the
sample sizes. Sample sizes were not predetermined for retrospective analyses, instead all data from electronic health records from allo-HCT
patients since 2003 were used, with specific exclusion criteria listed in the data exclusion section.
Data exclusions Non-adults, non-engrafted patients were excluded. Data from patients without valid two-day apart sample pairs were excluded.The
supplementary methods provides a detailed flow chart of data inclusion/exclusion.
Replication All analyses can be reproduced with the code and data provided. No experiments were conducted.
Randomization N/A as no trial was conducted for this study. The randomized data was previously published and the randomization procedure is explained in
the relevant reference (Taur et al. 2018)
Blinding N/A. No trial was conducted.


Antibodies ChIP-seq
Clinical data

April 2020
Policy information about studies involving human research participants

Population characteristics Tables S1-S5 list the patient population characteristics.
Recruitment No patients were specifically recruited for this work. Allo-HCT patients since 2003 were considered and included or excluded
as detailed in the data inclusion/exclusion section.
Ethics oversight The participants in the auto-FMT trial (NCT02269150) provided written informed consent to participate in the trial (#14-025).
2
Ethics oversight Participants in the observational cohorts at both Memorial Sloan Kettering Cancer Center and at Duke University School of
Medicine provided written informed consent for the use of their fecal specimens and clinical data. The use and analysis of

these specimens for the work herein was approved by IRBs at both institutions: MSK (#16-834) and Duke (PRO0006268 and
Pro00050975).
Clinical data
Policy information about clinical studies
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Clinical trial registration NIH clinicaltrials.gov : NCT02269150

https://clinicaltrials.gov/ct2/show/NCT02269150
Study protocol https://clinicaltrials.gov/ct2/show/NCT02269150
Data collection This is a randomized, open-label, controlled study designed to assess the efficacy of autologous fecal microbiota transplantation
(auto-FMT) for prevention of Clostridium difficile infection (CDI) in patients who have undergone allogeneic hematopoietic stem cell
transplantation (allo-HSCT). Patients will be enrolled prior to allo-HSCT; feces will be collected and stored from all participating
subjects prior to the initiation of conditioning regimens, analyzed by deep 16S rRNA gene sequencing, and tested by assay for
intestinal pathogens including Clostridium difficile. Later in the course of transplantation, following engraftment (defined as the first
day of three consecutive days, that the absolute blood neutrophil count is at above f 500 mm3), subjects will undergo fecal testing
for presence of Bacteroidetes by 16S PCR. Subjects will be eligible for study if they have a microbiologically diverse pre-transplant
colonic microbiota, and if the post-engraftment specimen contains Bacteroidetes at a prevalence equal to or below (0.1%)
Actual Study Start Date : October 2014

Estimated Primary Completion Date : October 2021
Outcomes Primary Outcome: Clostridium difficile infection (CDI) [ Time Frame: up to 1 year following randomization ]
CDI is defined as diarrheal stool (unformed stool conforming to the shape of a specimen container), and a positive test for toxin-
producing C. difficile (either by toxin B gene PCR or cytotoxicity assay).
April 2020
3
Article
Sex differences in immune responses that

underlie COVID-19 disease outcomes
https://doi.org/10.1038/s41586-020-2700-3 Takehiro Takahashi1,21, Mallory K. Ellingson2,21, Patrick Wong1,21, Benjamin Israelow1,3,21,

Carolina Lucas1,21, Jon Klein1,21, Julio Silva1,21, Tianyang Mao1,21, Ji Eun Oh1, Maria Tokuyama1,
Received: 4 June 2020
Peiwen Lu1, Arvind Venkataraman1, Annsea Park1, Feimei Liu1,4, Amit Meir5, Jonathan Sun6,
Accepted: 19 August 2020 Eric Y. Wang1, Arnau Casanovas-Massana2, Anne L. Wyllie2, Chantal B. F. Vogels2,
Rebecca Earnest2, Sarah Lapidus2, Isabel M. Ott2,7, Adam J. Moore2, Yale IMPACT Research
Published online: 26 August 2020
Team*, Albert Shaw3, John B. Fournier3, Camila D. Odio3, Shelli Farhadian3, Charles Dela Cruz8,
Check for updates Nathan D. Grubaugh2, Wade L. Schulz9,10, Aaron M. Ring1, Albert I. Ko2, Saad B. Omer2,3,11,12 &
Akiko Iwasaki1,13 ✉
There is increasing evidence that coronavirus disease 2019 (COVID-19) produces more
severe symptoms and higher mortality among men than among women1–5. However,
whether immune responses against severe acute respiratory syndrome coronavirus
(SARS-CoV-2) differ between sexes, and whether such differences correlate with the
sex difference in the disease course of COVID-19, is currently unknown. Here we
examined sex differences in viral loads, SARS-CoV-2-specific antibody titres, plasma
cytokines and blood-cell phenotyping in patients with moderate COVID-19 who had
not received immunomodulatory medications. Male patients had higher plasma
levels of innate immune cytokines such as IL-8 and IL-18 along with more robust
induction of non-classical monocytes. By contrast, female patients had more robust
T cell activation than male patients during SARS-CoV-2 infection. Notably, we found
that a poor T cell response negatively correlated with patients’ age and was associated
with worse disease outcome in male patients, but not in female patients. By contrast,
higher levels of innate immune cytokines were associated with worse disease
progression in female patients, but not in male patients. These findings provide a
possible explanation for the observed sex biases in COVID-19, and provide an
important basis for the development of a sex-based approach to the treatment and
care of male and female patients with COVID-19.
SARS-CoV-2 is the novel coronavirus first detected in Wuhan, China, in a more robust immune response to vaccines14. These findings collectively
November 2019 that causes COVID-196. On 11 March 2020, the World suggest a more robust ability among women to control infectious agents.
Health Organization (WHO) declared COVID-19 a pandemic7. A growing However, the mechanism by which SARS-CoV-2 causes more severe dis-
body of evidence reveals that male sex is a risk factor for a more severe disease in male patients than in female patients remains unknown.
ease, including death. Globally, approximately 60% of deaths from COVID- To determine the immune responses against SARS-CoV-2 infection in
19 are reported in men5, and a cohort study of 17 million adults in England male and female patients, we performed detailed analyses on the sex dif-
reported a strong association between male sex and the risk of death ferences in immune phenotypes by the assessment of viral loads, levels
from COVID-19 (hazard ratio 1.59, 95% confidence interval 1.53–1.65)8. of SARS-CoV-2-specific antibodies, plasma cytokines or chemokines,
Past studies have shown that sex has a considerable effect on the out- and blood-cell phenotypes.
come of infections and has been associated with underlying differences
in immune responses to infection9,10. For example, the prevalence of
hepatitis A and tuberculosis are notably higher in men that in women11. Overview of the study design
Viral loads are consistently higher in male patients with hepatitis C virus Patients who were admitted to the Yale-New Haven Hospital between
and human immunodeficiency virus (HIV)12,13. By contrast, women mount 18 March and 9 May 2020 and were confirmed positive for SARS-CoV-2
1
Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA. 2Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT,
USA. 3Department of Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, CT, USA. 4Department of Biomedical Engineering, Yale School of Engineering &
Applied Science, New Haven, CT, USA. 5Boyer Center for Molecular Medicine, Department of Microbial Pathogenesis, Yale University, New Haven, CT, USA. 6Department of Comparative
Medicine, Yale University School of Medicine, New Haven, CT, USA. 7Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA. 8Department of Medicine, Section of
Pulmonary and Critical Care Medicine, Yale University School of Medicine, New Haven, CT, USA. 9Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT, USA.
10
Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA. 11Yale Institute for Global Health, Yale University, New Haven, CT, USA. 12Yale School of Nursing,
Yale University, Orange, CT, USA. 13Howard Hughes Medical Institute, Chevy Chase, MD, USA. 21These authors contributed equally: Takehiro Takahashi, Mallory K. Ellingson, Patrick Wong,
Benjamin Israelow, Carolina Lucas, Jon Klein, Julio Silva, Tianyang Mao. *A list of authors and their affiliations appears at the end of the paper. ✉e-mail: akiko.iwasaki@yale.edu

Article
by RT–PCR from nasopharyngeal and/or oropharyngeal swabs in a Np swab Saliva b Anti-S1-IgG Anti-S1-IgM
P < 0.0001
CLIA-certified laboratories were enrolled through the IMPACT biore- P = 0.082 2.0 P = 0.0021
10 10 2.5
pository study15. In the IMPACT study, biospecimens including blood,
log10[SARS-CoV-2
8 8 2.0 P = 0.0034
nasopharyngeal swabs, saliva, urine and stool samples were collected 1.5
(copies ml–1)]
A450 nm
6 6 1.5
at study enrolment (baseline denotes the first time point) and longitu- 1.0
4 4 1.0
dinally on average every 3 to 7 days (serial time points). The detailed 0.5
2 2 0.5
demographics and clinical characteristics of these 98 participants are
0 0 0.0 0.0
shown in Extended Data Table 1. Plasma and peripheral blood mono-
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
_P
_P
C
C
C
C
nuclear cells (PBMCs) were isolated from whole blood, and plasma was
F_
F_
F_
F_
M
M
_H
_H
H
M
M
used for titre measurements of SARS-CoV-2 spike S1 protein-specific IgG c
P = 0.082 P = 0.0011 P = 0.024 P = 0.0005 P = 0.027
and IgM antibodies (anti-S1-IgG and -IgM) and cytokine or chemokine 400 300 250
IL-18 (pg ml–1)

measurements. Freshly isolated PBMCs were stained and analysed by 200
IL-6 (pg ml–1)
IL-8 (pg ml–1)

300
200
flow cytometry15. We obtained longitudinal serial time-point samples 150
200
from a subset of these 98 study participants (n = 48; information in 100
100
100 50
Extended Data Table 1). To compare the immune phenotypes between
0 0 0
sexes, two sets of data analyses were performed in parallel—baseline
F_ W
W
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
and longitudinal, as described below. As a control group, healthcare
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
M
M
_H
_H
_H
H
workers (HCWs) from Yale-New Haven Hospital were enrolled who were
M
P = 0.018 P = 0.0029 P = 0.0015
uninfected with COVID-19. Demographics and background information 8,000 400 40,000
P = 0.0004 P = 0.027 P= 0.0003
for the HCW group and the demographics of HCWs for cytokine assays
CXCL10 (pg ml–1)

CCL5 (pg ml–1)
ml–1)
6,000 300 30,000
and flow cytometry assays for the primary analyses are in Extended
CCL8 (pg
4,000 200 20,000
Data Table 1. Demographic data, time-point information of the samples
defined by the days from the symptom onset (DFSO) in each patient, 2,000 100 10,000
treatment information, and raw data used to generate figures and tables 0 0 0
is in Supplementary Table 1.
F_ W
W
t
Pt
W
W
t
Pt
W
W
t
Pt
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
M
M
_H
_H
_H
H
F_
F_
M
M
Baseline analysis
Fig. 1 | Comparison of viral RNA concentrations, titres of anti-SARS-CoV-2
The baseline analysis was performed on samples from the first time antibodies, and plasma cytokines and chemokine levels at the first
point of patients who met the following criteria: not in intensive care sampling of cohort A patients. a, Comparison of viral RNA measured from
unit (ICU), had not received tocilizumab, and had not received high nasopharyngeal (Np) swab and saliva. n = 14 for male and female patients
doses of corticosteroids (prednisone equivalent of more than 40 mg) (M_Pt and F_Pt, respectively) for nasopharyngeal samples, and n = 9 and 12,
before the first sample collection date. This patient group, cohort respectively, for saliva samples. Dotted lines indicate the detection limit of the
A, consisted of 39 patients (17 male and 22 female) (Extended Data assay (5,610 copies ml−1), and negatively tested data are shown on the x axis. ND,
Tables 1, 2). Intersex and transgender individuals were not repre- not detected. b, Titres of specific IgG and IgM antibodies against SARS-CoV-2
sented in this study. Figures 1–4 represent analyses of baseline raw S1 protein were measured. n = 13, 74, 15 and 20 for IgG, and n = 3, 18, 15 and
values obtained from patients in cohort A. In cohort A patients, male 20 for IgM, for male HCW (M_HCW), female HCW (F_HCW), M_Pt and F_Pt,
and female patients were matched in terms of age, body mass index respectively. The cut-off values for positivity are shown by the dotted lines.
c, Comparison of the plasma levels of representative innate immune cytokines
(BMI), and DFSO at the first time point sample collection (Extended
and chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and F_Pt,
Data Fig. 1a). However, there were significant differences in age and
respectively. P values were determined by unpaired two-tailed t-test (a) or
BMI between HCW controls and patients (patients had higher age
one-way analysis of variance (ANOVA) with Bonferroni multiple comparison
and BMI values) (Extended Data Table 1), and therefore an age- and test (b, c). All P values < 0.10 are shown. Data are mean ± s.e.m. The results of all
BMI-adjusted difference-in-differences analysis was also performed the cytokines or chemokines measured can be found in Extended Data Fig. 1b.
in parallel (Extended Data Table 3).
Longitudinal analysis difference over time in immune responses between male and female
As parallel secondary analyses, we performed longitudinal analy- patients with COVID-19 and male and female HCWs (Extended Data
sis on a total patient cohort (cohort B) to evaluate the difference in Table 5) were calculated.
immune response over the course of the disease between male and
female patients. Cohort B included all patient samples from cohort
A (including several time-point samples from the cohort A patients) Sex differences in cytokines and chemokines
as well as an additional 59 patients who did not meet the inclusion We first compared the concentrations of viral RNA of male and female
criteria for cohort A. Because cohort B included more severely affected patients. For both cohorts A and B, there was no difference by sex in
patients in ICU, the average clinical scores were higher in cohort B than terms of the viral RNA concentrations in nasopharyngeal swab and
in cohort A (mean ± s.d.: 1.3 ± 0.5 (female) and 1.4 ± 0.5 (male) for cohort saliva (Fig. 1a, Extended Data Tables 3, 4).
A, and 2.5 ± 1.5 (female) and 2.7 ± 1.3 (male) for cohort B) (Extended Anti-SARS-CoV-2 S1-specific IgG and IgM (anti-S1-IgG and -IgM)
Data Table 1). This analysis included several time-point samples from antibodies were comparable in infected male and female in cohort A
98 participants in total. Data from cohort B were analysed for sex dif- (Fig. 1b) and in cohort B (Extended Data Tables 4, 5). Thus, at baseline
ferences in immune responses among patients using longitudinal and during the course of the disease, there were no clear differences
analysis, controlling for potential confounding by age, BMI, receipt in the amount of IgG or IgM generated against the S1 protein between
of immunomodulatory treatment (tocilizumab or corticosteroids), male and female patients.
DFSO and ICU status. Second, we conducted a longitudinal analysis that Next, we analysed the levels of 71 cytokines and chemokines in the
compared male and female patients with COVID-19 to male and female plasma. Levels of many pro-inflammatory cytokines, chemokines and
HCWs, controlling for age and BMI. Adjusted least square means differ- growth factors, including IL-1β, IL-6, IL-8, TNF, CCL2, CXCL10 and G-CSF,
ence over time in immune responses between male and female patients are increased in the plasma of patients with COVID-1916. In line with
with COVID-19 (Extended Data Table 4) and adjusted least square means previous reports, levels of inflammatory cytokine or chemokine were

a P = 0.021 b HCW Patients factors were increased in patients compared to HCWs in both sexes,
20 7.82 intMono 6.43 39.6 9.59
and the levels between sexes were comparable (Fig. 1c, Extended Data
B cells (% of live)
(CD14+
CD16+)
15
ncMono Fig. 1b, Extended Data Table 3). However, levels of IL-8 and IL-18 were
10 (CD14dim Male
CD16+) significantly higher in male patients than in female patients in cohort
CD16-Brilliant Violet 786

5 cMono
(CD14+ A (Fig. 1c). In age- and BMI-adjusted analyses of cohort A, we found
61.0 CD16–)
32.5
0 that although IL-8 and IL-18 were no longer significantly higher among
F_ W
W
t
Pt
male patients than in to female patients, IL-8 and CXCL10 were sig-
_P
C
C
F_
105
M
_H
2.43 5.49 19.6 19.1

M
P < 0.0001
104
nificantly increased in male patients compared to male HCWs than in
100 P = 0.0003
female patients compared to female HCWs (difference-in-differences,
T cells (% of live)
80 Female
103
60
Extended Data Table 3). In adjusted analyses of cohort B, although we
40
0
59.7 25.8
did not see significant sex differences in the levels of IL-8 and IL-18, we
-103
20
-103 0 103 104 105 found significantly higher levels of CCL5 in male patients than in female
0
CD14-PE-Cy7
patients over the course of the disease (Extended Data Table 4) and
F_ W
W
t
Pt
significantly increased levels of CCL5 in male patients compared to male

_P
C
C
F_
M
_H
H
M
c HCWs than in female patients compared to female HCWs (Extended

P = 0.0015 P < 0.0001
P = 0.047 P = 0.0081
50 25 20 P = 0.039 20 Data Table 5, difference-in-differences). These data indicated that,
Total Mono (% of live)
P= 0.019
intMono (% of live)
ncMono (% of live)
cMono (% of live)
40 20 15 15 although levels of most of the innate inflammatory cytokines and

30 15
10 10 chemokines were comparable, there were a few factors that are more
20 10
10 5 5 5 robustly increased at the baseline (IL-8 and IL-18) and during the course
0 0 0 0 of the disease (CCL5) in male patients than in female patients.
F_ W
W
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
F_
M
M
_H
_H
_H
_H
H
M
d e
R = –0.210, P = 0.387 (F)
F Monocyte differences by sex
50 25 M (low-int)
≥90
80 40 20
R = 0.635, P = 0.011 (M) M (high) Next, we examined the immune cell phenotype by flow cytometry. Freshly
BMI (kg m–2)
70
Age (years)
isolated PBMCs were stained with specific antibodies to identify T cells,

DFSO
60 30 15
50 15
40
30
20 10 B cells, natural killer T cells, natural killer cells, monocytes, macrophages
20 10 5
and dendritic cells to investigate the composition of PBMCs (Extended
ncMono (% of live)
10 10
0 0 0
Data Fig. 2b). Consistent with a previous report on a decrease in T cells
t
h
-in
-in
-in
ig
ig
ig
in patients16, in cohort A, the proportion of T cells in the live cells was sig-
H
H
w
w
Lo
Lo
Lo
80 P = 0.030 250 P = 0.050 8,000 P = 0.013 nificantly lower in patients, whereas the proportion of B cells was higher
in both male and female patients than in HCWs (Fig. 2a, Extended Data
ml–1)
CCL5 (pg ml–1)

T cells (% of live)
60 200 6,000 0
150 Table 3). There was no difference in the numbers of B cells across all groups,
IL-18 (pg
40 4,000
100
20 2,000
but the numbers of T cells were lower in patients of both sexes (data not
50
0 0 0 2.8 3.2 3.6
shown). By contrast, in cohort B, we found that male patients had signifi-
log10[CCL5 (pg ml–1)] cantly lower numbers of T cells, both total counts and as a proportion of
h
h
t
t
ig
ig
ig
-in
-in
-in
H
H
w
live cells, over the course of the disease than female patients (Extended
Lo
Lo
Lo
Fig. 2 | Differences in composition of PBMCs between male and female Data Table 4). Next, we found higher populations of monocytes in both
patients in cohort A at the first sampling. a, Comparison on the proportion of sexes in cohort A (Fig. 2b, c, Extended Data Fig. 2b) compared to HCWs.
B cells (top) and T cells (bottom) in live PBMCs. n = 6, 42, 16 and 21 for M_HCW, F_ Although CD14+CD16− classical monocytes were comparable across all
HCW, M_Pt and F_Pt, respectively. b, Representative 2D plots for CD14 and CD16 groups, levels of CD14+CD16+ intermediate monocytes were increased
in monocytes gate (live/singlets/CD19−CD3−/CD56−CD66b−). Numbers in red in patients compared with HCWs, and this increase was more robust in
indicate the percentages of each population in the parent monocyte gate. female patients (Fig. 2b, c). By contrast, male patients had higher levels of
c, Comparison between percentages of total monocytes, classical monocytes CD14loCD16+ non-classical monocytes than controls and female patients
(cMono), intermediate monocytes (intMono) and non-classical monocytes (Fig. 2b, c). These differences were observed in age- and BMI-adjusted
(ncMono) in the live PBMCs. n = 6, 42, 16 and 21 for M_HCW, F_HCW, M_Pt and analyses, too, but were not significant (Extended Data Table 3).
F_Pt, respectively. d, Comparison of age, BMI, DFSO, T cells (percentage of live
We then divided the 17 cohort A male patients into two groups, namely,
PBMCs) and plasma IL-18 and CCL5 levels between male patients who had high
a ‘high’ group who had high percentages of non-classical monocytes
non-classical monocytes and low-intermediate non-classical monocytes. n = 13
(upper quartile 4 patients, all had more than 5% of non-classical mono-
and 4 for ‘low-int’ and ‘high’ group, respectively, for age, BMI and DFSO. n = 12
and 4 for ‘low-int’ and ‘high’ group, respectively, for T cells and IL-18 or CCL5
cytes) and a ‘low-intermediate’ group (others, 13 patients). We compared
levels. e, Correlation between plasma CCL5 levels and non-classical monocytes age, BMI, DFSO, T cells, and plasma levels of IL-18 and CCL5. Although
(percentage of live cells). Pearson correlation coefficients (R) and P values we found no differences in age, BMI or DFSO (Fig. 2d), we noted that the
for each sex are shown. Lines represent linear regression lines and shading group with high levels of non-classical monocytes had significantly lower
represents 95% confidence intervals for each sex. ncMono-high male patients levels of T cells and higher levels of CCL5 in plasma (Fig. 2d). In addition,
(n = 4) are shown with orange open squares, and ncMono-low-int male patients we found a significant correlation between CCL5 levels and abundance
(n = 11) are shown with orange closed squares. n = 19 for female patients (purple in non-classical monocytes only in male patients (Fig. 2e). These findings
circles). Data are mean ± s.e.m. in a, c and d. P values were determined by one-way suggest that the progression from classical to non-classical monocytes
ANOVA with Bonferroni multiple comparison test (a, c) or unpaired two-tailed may be arrested at the intermediate stage in female patients, and that
t-test (d). All P values < 0.10 are shown. increased innate inflammatory cytokines and chemokines are associated
with more robust activation of innate immune cells at the baseline as well
generally higher in patients than in controls (Fig. 1c, Extended Data as more robust longitudinal T cell decrease in male patients.
Figs. 1b, 2a, Extended Data Table 3). The levels of type-I, -II or -III inter-
feron (IFN) were comparable between the sexes in cohort A (Extended
Data Fig. 1b, Extended Data Table 3). However, we found higher levels Higher T cell activation in female
of IFNα2 in female patients than in male patients in cohort B (Extended We further examined the T cell phenotype in patients with COVID-19.
Data Table 4). The levels of many cytokines, chemokines and growth The composition of overall CD4-positive and CD8-positive cells among

Article
a CD4 T cells CD8 T cells
b CD4 CD8
100 80 HCW Patient HCW Patient
Percentage of CD3
80 60
60 0.85 0.83 0.42 1.41
40 Male
40
20
HLA-DR-FITC
20
0 0
105
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
H
104
M
0.78 13.7 0.90 27.9
c CD38+HLA-DR+ CD4 CD38+HLA-DR+ CD8 103 Female
P < 0.0001
0
10 P = 0.0002 8 P = 0.036
Percentage of CD3
8 -104 0 104 105

6
6 CD38-Brilliant Violet 711
4
4
2
d CD4 CD8
2
HCW Patient HCW Patient
0 0
0.48 0.55 0.75 2.27
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
H
Male
M
e PD-1+TIM-3+ CD4 PD-1+TIM-3+ CD8

P = 0.016 P < 0.0001
PD-1-PE
6 8
Percentage of CD3
105
6 1.43 10.8 2.67 25.9
4 104
4 Female
103
2
2 0
–103
0 0
–103 0 103 104 105
W
W
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
TIM-3-Alexa Fluor 647

F_
M
Fig. 3 | Sex difference in T cell phenotype at the first sampling of cohort TIM-3 in the CD4 and CD8 T cells. Numbers in red indicate the percentages of
A patients. a, Percentages of CD4 and CD8 in the CD3-positive cells. PD-1+TIM-3+ populations in the parent gate (live/singlets/CD3+/CD4+ or CD8+/
b, Representative 2D plots for CD38 and HLA-DR in the CD4 and CD8 T cells. CD45RA−). e, Percentages of PD-1+TIM-3+ CD4 or CD8 cells in CD3-positive cells.
Numbers in red indicate the percentages of CD38+HLA-DR+ populations in the n = 6, 45, 16 and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. P values
parent gate (live/singlets/CD3+/CD4+ or CD8+). c, Percentages of CD38+HLA-DR+ were determined by one-way ANOVA with Bonferroni multiple comparison
CD4 or CD8 cells in CD3-positive cells. d, Representative 2D plots for PD-1 and test. Data are mean ± s.e.m. All P values < 0.10 are shown.
T cells were similar between all groups in cohort A (Fig. 3a, Extended Data A, 6 patients of each sex deteriorated during the course of the disease
Fig. 2c, Extended Data Table 3). Detailed phenotyping of T cells for naive (35.3% and 27.3%, respectively), and the intervals between the dates
T cells, central or effector memory T (TCM/TEM) cells, follicular helper at which the patients reached Cmax (DFSO at Cmax) and the first sample
T (TFH) cells, regulatory T (Treg) cells revealed no remarkable differences collection (DFSO at C1) were not significantly different between dete-
in the frequency of these subsets between sexes (Extended Data Fig. 2c). riorated male and female patients (mean ± s.d. = 3.7 ± 4.1 and 4.2 ± 2.7,
However, we observed higher levels of CD38 and HLA-DR-positive acti- respectively; P = 0.81 by unpaired two-tailed t-test).
vated T cells in female patients than in male patients (Fig. 3b, c). In paral- We first examined age, BMI, viral loads and titres of anti-S1-IgG
lel, PD-1- and TIM-3-positive terminally differentiated T cells were more antibodies between the stabilized and deteriorated groups in a
prevalent among female patients than male patients (Fig. 3d, e). These sex-aggregated manner. We found that the deteriorated group had on
findings were seen in both CD4 and CD8 T cells, but the differences average a higher BMI than the stabilized group. Although the age was not
were more robust in CD8 T cells (Fig. 3c, e, Extended Data Table 3). We statistically different, the stabilized group spanned a larger age range
also stained for intracellular cytokines such as IFNγ, granzyme B (GzB), than the deteriorated group, who were generally of a more advanced age.
TNF, IL-6 and IL-2 in CD8 T cells, and IFNγ, TNF, IL-17A, IL-6 and IL-2 in The viral load and antibody titres were comparable (Fig. 4a). Next, we
CD4 T cells. Levels of these cytokines were higher in patients than in examined these factors in a sex-disaggregated manner, and found that
controls, and were generally comparable between sexes in the patients the deteriorated male (M_deteriorated) group was on average signifi-
(Extended Data Fig. 2d). Analyses of T cell phenotypes in cohort B did cantly older than the stabilized male (M_stabilized) group, whereas the
not reveal any significant differences between sexes (Extended Data two female groups (F_deteriorated and F_stabilized) were comparable
Tables 4 and 5). Therefore, female patients with COVID-19 had more in age (Fig. 4b). In addition, BMI was higher for the M_deteriorated than
abundant activated and terminally differentiated T cell populations the M_stabilized group, whereas there was no difference in BMI between
than male patients at baseline in unadjusted analyses. the F_deteriorated and F_stabilized groups (Fig. 4b). By contrast, the
F_deteriorated group had higher viral load in saliva than the F_stabilized
group, whereas there was no difference in the male groups (Fig. 4b).
Sex-dependent immunity and disease course The levels of antibodies were comparable between the deteriorated
We investigated whether certain immune phenotypes were correlated and stabilized groups both in male and female, but stabilized female
with disease trajectory, and whether these phenotypes and factors dif- tended to have higher antibody levels (Fig. 4b).
fered between the sexes. To this end, we evaluated the disease course We further investigated whether the key factors identified in the
of patients in cohort A. The clinical scores at the first sample collection previous analyses correlated with disease progression in male and
(C1) were 1 or 2 for all of the patients in cohort A. The patients were cat- female patients. We observed that regardless of sex, some chemokines
egorized into a ‘deteriorated’ group if the patients marked a score of 3 or and growth factors, such as CXCL10 (also known as IP-10) and M-CSF,
higher after the first sample collection date as their maximum clinical were increased in patients that went on to develop worse disease. How-
scores during admission (Cmax). By contrast, if the patients maintained ever, there were some innate immune factors, such as CCL5, TNFSF10
the score of 1 or 2, they were categorized as ‘stabilized’ (Extended Data (also known as TRAIL) and IL-15, that were specifically increased only
Table 2). Both in male (n = 17) and female (n = 22) patients from cohort in female patients that subsequently progressed to worse disease,

a Np swab Saliva Anti-S1-IgG Stabilized
P = 0.051 P = 0.012 Deteriorated
50 10 10 2.5
log10[SARS-CoV-2
log10[SARS-CoV-2
≥90
80 40 8 8 2.0
(copies ml–1)]
(copies ml–1)]
BMI (kg m–2)
70
Age (years)
A450 nm
60 30 6 6 1.5
50
40 20 4 4 1.0
30
20 10 2 2 0.5
10
0 0 0 0 0.0
b M_stabilized M_deteriorated F_stabilized F_deteriorated

Np swab Saliva Anti-S1-IgG
P = 0.0086 P = 0.024 P = 0.024 P = 0.074
50 10 10 2.5
log10[SARS-CoV-2
log10[SARS-CoV-2
≥90
(copies ml–1)]
(copies ml–1)]
BMI (kg m–2)
80 40 8 8 2.0
Age (years)
70
A450 nm
60 30 6 6 1.5
50
40 20 4 4 1.0
30
20 10 2 2 0.5
10
0 0 0 0 0.0
c P = 0.046 P = 0.080 P = 0.014 P = 0.0023 P = 0.011 P = 0.037

40,000 1,500 250 8,000 150
TNFSF10 (pg ml–1)

M-CSF (pg ml–1)
CXCL10 (pg ml–1)
CCL5 (pg ml–1)

200
IL-15 (pg ml–1)

30,000 6,000
1,000 100
150
20,000 4,000
100
500 50
10,000 50 2,000
0 0 0 0 0
d
P = 0.0026 P = 0.025 P = 0.094
10 8 6 8 20
CD38+HLA-DR+ CD4
CD38+HLA-DR+ CD8
PD-1+TIM-3+ CD4
PD-1+TIM-3+ CD8
8 6 6 15
(% of CD3)
(% of CD3)
(% of CD3)
(% of CD3)
(% of CD3)
IFNγ+ CD8
4
6
4 4 10
4
2
2 2 2 5
0 0 0 0 0
CD38 HLA DR CD8
CD38 HLA DR CD8
e f
PD 1TIM3 CD8
PD 1TIM3 CD8
R = 0.103, P = 0.647 (F) R = 0.245, P = 0.272 (F)

Deterioration
Deterioration
Anti-S1-IgG
Anti-S1-IgG
R = –0.523, P = 0.038 (M) R = –0.529, P = 0.035 (M)

Cmax – C1
IFNγ CD8
TNFSF10
Cmax – C1
IFNγ CD8
TNFSF10
CXCL10
CXCL10
Np load
Np load
Sex 15
MCSF
CD38+HLA-DR+CD8 (% of CD3)
MCSF
CCL5
CCL5
IL-15
F
IL-15
BMI
Age
BMI
Age
M
1 1 6
IFNγ+ CD8 (% of CD3)

Age Age
BMI 0.8 BMI 0.8
Np load 0.6 Np load 0.6 10
Anti-S1-IgG 0.4 Anti-S1-IgG
CXCL10 0.4 4
0.2 CXCL10
MCSF MCSF 0.2
TNFSF10 0 TNFSF10 0
CCL5 –0.2 CCL5 –0.2 2 5
Female IL-15 Male IL-15
PD1 TIM3 CD8 –0.4 –0.4
PD1 TIM3 CD8
CD38 HLA DR CD8 –0.6 CD38 HLA DR CD8 –0.6
IFNγ CD8 –0.8 IFNγ CD8 –0.8 0
Deterioration –1 Deterioration 0
Cmax – C1 –1
Cmax – C1
40 60 80 ≥90 40 60 80 ≥90
Age Age
Fig. 4 | Differential immune phenotypes at the first sampling and disease are shown. n = 10, 6, 16 and 6 for M_stabilized, M_deteriorated, F_stabilized and
progression between sexes in cohort A patients. a, b, Sex-aggregated (a) and F_deteriorated group, respectively. e, Pearson correlation heat maps of the
sex-disaggregated (b) comparison of age, BMI, RNA concentration in indicated parameters are shown for each sex. For viral RNA concentrations
nasopharyngeal swab and saliva, and anti-S1-IgG antibodies between the and cytokine or chemokine levels, log-transformed values were used for the
stabilized and deteriorated group. n = 11, 6, 16 and 6 for age and BMI, n = 9, 5, calculation of the correlations. The size and colour of the circles indicate
9 and 5 for nasopharyngeal swab, n = 6, 3, 8 and 4 for saliva, and n = 10, 5, 14 and the correlation coefficient (R), and only statistically significant correlations
6 for anti-S1-IgG antibodies, for M_stabilized, M_deteriorated, F_stabilized and (P < 0.05) are shown. Clinical deterioration from the first time point was scored
F_deteriorated group, respectively. Dotted lines in the viral concentration and by Cmax − C1. n = 17 and 22 for male and female, respectively. f, Correlation
anti-S1-IgG panels indicate the detection limit and cut-off value for positivity, between age and CD38+HLA-DR+ CD8 T cells (left) and IFNγ+CD8 T cells (right).
respectively. c, Cytokine or chemokine comparison between stabilized and Pearson correlation coefficient (R) and P values for each correlation and sex are
deteriorated groups. n = 10, 6, 14 and 5 for the M_stabilized, M_deteriorated, shown. Lines represent linear regression lines and shading represents 95%
F_stabilized and F_deteriorated groups, respectively. d, Comparisons in the confidence intervals for each sex. P values were determined by unpaired
proportions of activated (CD38+HLA-DR+) and terminally differentiated two-tailed t-test in a–d. Data are mean ± s.e.m. All P values < 0.10 are shown.
(PD-1+TIM-3+) CD4 or CD8 T cells, and IFNγ+CD8 T cells in CD3-positive T cells
but this difference was not observed in male patients (Fig. 4c). In the We finally examined the correlations between age, BMI, viral loads,
age- and DFSO-adjusted analysis of cohort A, we also found that CCL5 anti-S1 antibodies, cytokines or chemokines, activated or terminally
was only increased in female patients that progressed to worse disease differentiated or IFNγ-producing CD8 T cells, and clinical disease
compared to the stabilized patients, but no such correlation was found course (‘Cmax − C1’ was used for the deterioration score). The corre-
in male patients (Extended Data Table 6). lation matrix showed that in female patients, higher levels of innate
T cell phenotypes in these groups showed that male patients immunity cytokines, such as TNFSF10 and IL-15, were positively cor-
whose disease worsened had a significantly lower proportion of acti- related with disease progression, whereas there was no association
vated T cells (CD38+HLA-DR+) and terminally differentiated T cells between CD8 T cell status and deterioration (Fig. 4e, results of age-
(PD-1+TIM-3+) and tendencies for fewer IFNγ+ CD8 T cells at the first and DFSO-adjusted analysis in Extended Data Table 6). In particular,
sample collection, compared with their counterpart male who did not CXCL10, M-CSF and IL-15 were positively correlated with IFNγ+CD8
progressed to worse disease (Fig. 4d). However, in female patients, T cells in female patients (Fig. 4d).
the deteriorated group had similar levels of these types of CD8 T cells By contrast, in male patients, progressive disease was associated
compared with the stabilized group (Fig. 4d). with higher age, higher BMI, and poor CD8 T cell activation (Fig. 4e).

Article
Poor CD8 T cell activation and poor production of IFNγ by CD8 T cells and competing interests; and statements of data and code availability
were significantly correlated with patients’ age, whereas these cor- are available at https://doi.org/10.1038/s41586-020-2700-3.
relations were not seen in female patients (Fig. 4e, f ). These differ-
ences seemed to highlight the differences between the sexes in the 1. Chen, N. et al. Epidemiological and clinical characteristics of 99 cases of 2019 novel
immune responses against SARS-CoV-2 as well as the difference of coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 395, 507–513 (2020).
the potential prognostic or predictive factors for clinical deteriora- 2. Li, Q. et al. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected
pneumonia. N. Engl. J. Med. 382, 1199–1207 (2020).
tion of COVID-19. 3. Yang, X. et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2
pneumonia in Wuhan, China: a single-centered, retrospective, observational study.
Lancet Respir. Med. 8, 475–481 (2020).
4. Meng, Y. et al. Sex-specific clinical characteristics and prognosis of coronavirus
Discussion disease-19 infection in Wuhan, China: a retrospective study of 168 severe patients.
Our results revealed key differences in immune responses during the PLoS Pathog. 16, e1008520 (2020).
5. Gebhard, C., Regitz-Zagrosek, V., Neuhauser, H. K., Morgan, R. & Klein, S. L. Impact of sex
disease course of SARS-CoV-2 infection in male and female patients.
and gender on COVID-19 outcomes in Europe. Biol. Sex Differ. 11, 29 (2020).
First, we found that the levels of several important pro-inflammatory 6. Zhu, N. et al. A novel coronavirus from patients with pneumonia in China, 2019. N. Engl. J.
innate immunity chemokines and cytokines such as IL-8, IL-18 (at base- Med. 382, 727–733 (2020).
7. WHO. WHO Director-General’s Opening Remarks at the Media Briefing on COVID-19 - 11
line) and CCL5 (longitudinal analysis) were higher in male patients,
March 2020 https://www.who.int/dg/speeches/detail/who-director-general-s-opening-
which correlated with higher non-classical monocytes (at baseline). remarks-at-the-media-briefing-on-covid-19---11-march-2020 (2020).
Second, we observed a more robust T cell response among female 8. Williamson, E. J. et al. Factors associated with COVID-19-related death using OpenSAFELY.
Nature 584, 430–436 (2020).
patients than male patients at baseline. In particular, activated CD8
9. Klein, S. L. & Flanagan, K. L. Sex differences in immune responses. Nat. Rev. Immunol. 16,
T cells were significantly increased only in female patients but not 626–638 (2016).
in male patients compared with healthy volunteers. Analysis of their 10. Fischer, J., Jung, N., Robinson, N. & Lehmann, C. Sex differences in immune responses to
infectious diseases. Infection 43, 399–403 (2015).
clinical trajectory showed that, although poor T cell responses were
11. Guerra-Silveira, F. & Abad-Franch, F. Sex bias in infectious disease epidemiology: patterns
associated with future progression of disease in male patients, higher and processes. PLoS One 8, e62390 (2013).
levels of innate immune cytokines were associated with worsening of 12. Moore, A. L. et al. Virologic, immunologic, and clinical response to highly active
antiretroviral therapy: the gender issue revisited. J. Acquir. Immune Defic. Syndr. 32,
COVID-19 disease in female patients. Notably, the T cell response was 452–461 (2003).
significantly and negatively correlated with patients’ age in male, but 13. Collazos, J., Asensi, V. & Cartón, J. A. Sex differences in the clinical, immunological and
not female, patients. These data indicate key differences in the base- virological parameters of HIV-infected patients treated with HAART. AIDS 21, 835–843
(2007).
line immune capabilities in male and female patients during the early 14. Fink, A. L., Engle, K., Ursin, R. L., Tang, W. Y. & Klein, S. L. Biological sex affects vaccine
phase of SARS-CoV-2 infection, and suggest a potential immunologi- efficacy and protection against influenza in mice. Proc. Natl Acad. Sci. USA 115,
cal underpinning of the distinct mechanisms of disease progression 12477–12482 (2018).
15. Lucas, C. et al. Longitudinal analyses reveal immunological misfiring in severe COVID-19.
between sexes. These analyses also provide a potential basis for taking Nature 584, 463–469 (2020).
sex-dependent approaches to prognosis, prevention, care, and therapy 16. Huang, C. et al. Clinical features of patients infected with 2019 novel coronavirus in
for patient with COVID-19. Wuhan, China. Lancet 395, 497–506 (2020).
Although our study provides a strong basis for further investigation Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
into how COVID-19 disease dynamics may differ between male and published maps and institutional affiliations.
female patients, it is important to note that there are some limitations
© The Author(s), under exclusive licence to Springer Nature Limited 2020
to the analyses presented in this Article. First, we acknowledge that
the healthy HCWs used as the control population were not matched to
patients on the basis of age, BMI or underlying risk factors. To account Yale IMPACT Research Team*
for this, we performed adjusted analyses for the baseline and longi-
tudinal comparisons between patients (cohort A and the full patient Kelly Anastasio14, Michael H. Askenase15, Maria Batsu16, Hannah Beatty16, Santos Bermejo16,
population, cohort B) and HCWs, controlling for age and BMI. However, Sean Bickerton17, Kristina Brower2, Molly L. Bucklin1, Staci Cahill14, Melissa Campbell3,
Yiyun Cao1, Edward Courchaine17, Rupak Datta3, Giuseppe DeIuliis8, Bertie Geng16,
we cannot rule out residual confounding due to underlying risk factors Laura Glick16, Ryan Handoko16, Chaney Kalinich2, William Khoury-Hanold1, Daniel Kim1,
not available for the HCW controls. Lynda Knaggs16, Maxine Kuang14, Eriko Kudo1, Joseph Lim18, Melissa Linehan1, Alice
Collectively, these data suggest that vaccines and therapies to Lu-Culligan1, Amyn A. Malik11, Anjelica Martin1, Irene Matos16, David McDonald16, Maksym
Minasyan16, Subhasis Mohanty3, M. Catherine Muenker2, Nida Naushad16, Allison Nelson16,
increase T cell immune responses to SARS-CoV-2 might be warranted Jessica Nouws16, Marcella Nunez-Smith19, Abeer Obaid16, Isabel Ott2, Hong-Jai Park16,
for male patients, whereas female patients might benefit from thera- Xiaohua Peng16, Mary Petrone2, Sarah Prophet20, Harold Rahming16, Tyler Rice1, Kadi-Ann
pies that dampen innate immune activation early during disease. The Rose16, Lorenzo Sewanan16, Lokesh Sharma8, Denise Shepard16, Erin Silva16, Michael
Simonov16, Mikhail Smolgovsky16, Eric Song1, Nicole Sonnert1, Yvette Strong16, Codruta
immune landscape in patients with COVID-19 is considerably different Todeasa16, Jordan Valdez16, Sofia Velazquez15, Pavithra Vijayakumar16, Haowei Wang3,
between the sexes, and these differences may underlie heightened Annie Watkins2, Elizabeth B. White2 & Yexin Yang1
disease vulnerability in men.
14
Yale Center for Clinical Investigation, Yale University School of Medicine, New Haven,
CT, USA. 15Department of Neurology, Yale University School of Medicine, New Haven, CT,
USA. 16Yale School of Medicine, New Haven, CT, USA. 17Department of Biochemistry and
Online content of Molecular Biology, Yale University School of Medicine, New Haven, CT, USA. 18Yale
Any methods, additional references, Nature Research reporting summa- Viral Hepatitis Program, Yale University School of Medicine, New Haven, CT, USA. 19Equity
Research and Innovation Center, Yale University, New Haven, CT, USA. 20Department of
ries, source data, extended data, supplementary information, acknowl- Molecular, Cellular and Developmental Biology, Yale University School of Medicine,
edgements, peer review information; details of author contributions New Haven, CT, USA.

Methods were positive, these samples were excluded from the analyses. In some
HCWs, samples were collected for the assays at up to two time points. In
No statistical methods were used to predetermine sample size. The these cases, if the data for a certain type of assay were available for both of
experiments were not randomized. Investigators were blinded during these time points, only the first time point data were used and otherwise
experiments in terms of the sex or other clinical background informa- data for either time point were used in the main analyses with cohort A.
tion, with the sample labels having de-identified patient IDs that did
not contain any of this information. Viral RNA measurement
SARS-CoV-2 RNA concentrations were measured from nasopharyngeal
Ethics statement samples and saliva samples by RT–PCR as previously described18,19. In
This study was approved by Yale Human Research Protection Program short, total nucleic acid was extracted from 300 μl of viral transport
Institutional Review Boards (FWA00002571, Protocol ID. 2000027690). media from the nasopharyngeal swab or 300 μl of whole saliva using
Informed consent was obtained from all enrolled patients and health- the MagMAX Viral/Pathogen Nucleic Acid Isolation kit (ThermoFisher
care workers. Scientific) using a modified protocol and eluted into 75 μl of elution
buffer19. For SARS-CoV-2 RNA detection, 5 μl of RNA template was tested
Patients and HCWs as previously described18, using the US CDC real-time RT–PCR primer/
Adult patients (≥18 years old) admitted to Yale-New Haven Hospital probe sets for 2019-nCoV_N1, 2019-nCoV_N2, and the human RNase P
between 18 March and 9 May 2020, positive for SARS-CoV-2 by RT–PCR (RP) as an extraction control. Virus RNA copies were quantified using
from nasopharyngeal and/or oropharyngeal swabs, and able to provide a tenfold dilution standard curve of RNA transcripts that we previ-
informed consent (surrogate consent accepted) were eligible for the ously generated18. If the RNA concentration was lower than the limit
Yale IMPACT Biorepository study, and 198 patients were enrolled in this of detection (ND) that was determined previously18, the value was set
period. All patients necessitated hospitalization for their symptoms and to 0 and used for the analyses.
had an WHO score17 of at least 3 at admission (denoting hospitalized,
mild disease). At the initial screening, clinical PCR tests were performed Isolation of plasma
in CLIA-certified laboratory and only the PCR-positive patients were Plasma samples were collected after whole blood centrifugation at
enrolled. Only after the confirmation of PCR-positivity, the patients 400g for 10 min at room temperature with brake off. The plasma was
were enrolled and the first time point samples for this study were col- then carefully transferred to 15-ml conical tubes and then aliquoted
lected for each patient. The first time point samples were collected at and stored at −80 °C for subsequent analysis.
11.4 ± 8.1, 10.2 ± 6.3, 11.7 ± 7.2 and 12.1 ± 7.3 (mean ± s.d.) DFSO in cohort A
female, cohort A male, cohort B female and cohort B male, respectively SARS-CoV-2 specific antibody measurement
(Extended Data Fig. 1a, right panel for cohort A, Extended Data Table 1). ELISAs were performed as previously described20. In short, Triton X-100
Among these patients, we could obtain whole blood for flow cytom- and RNase A were added to serum samples at final concentrations of
etry analysis using fresh PBMCs, plasma for cytokine or chemokine 0.5% and 0.5 mg ml−1 respectively and incubated at room temperature
measurements, anti-S1 antibody measurements and nasopharyngeal for 30 min before use to reduce risk from any potential virus in serum.
swab and saliva from total of 98 individuals for the present study. For Then, 96-well MaxiSorp plates (Thermo Scientific 442404) were coated
longitudinal analyses, biospecimens (blood, nasopharyngeal swabs, with 50 μl per well of recombinant SARS-CoV-2 S1 protein (ACROBiosys-
saliva, urine, and/or stool) were collected at study enrolment (baseline) tems S1N-C52H3-100 μg) at a concentration of 2 μg ml−1 in PBS and were
and on average every 3 to 7 days while in the hospital in 48 of these 98 incubated overnight at 4 °C. The coating buffer was removed, and plates
patients. were incubated for 1 h at room temperature with 200 μl of blocking
The patients were assessed with a locally developed clinical scor- solution (PBS with 0.1% Tween-20, 3% milk powder). Serum was diluted
ing system for disease severity; (1): admitted and observed without 1:50 in dilution solution (PBS with 0.1% Tween-20, 1% milk powder) and
supplementary oxygen; (2) required ≤ 3 l supplementary oxygen via 100 μl of diluted serum was added for two hours at room temperature.
nasal canal to maintain SpO2 > 92%; (3) received tocilizumab, which per Plates were washed three times with PBS-T (PBS with 0.1% Tween-20) and
hospital treatment protocol required that the patient to require >3 l 50 μl of HRP anti-Human IgG Antibody (GenScript A00166, 1:5,000) or
supplementary oxygen to maintain SpO2 > 92%, or, required >2 l sup- anti-Human IgM-Peroxidase Antibody (Sigma-Aldrich A6907, 1:5,000)
plementary oxygen to maintain SpO2 > 92% and had a high-sensitivity diluted in dilution solution were added to each well. After 1 h of incu-
C-reactive protein (CRP) > 70; (4) the patient required ICU-level care; bation at room temperature, plates were washed six times with PBS-T.
(5) the patient required intubation and mechanical ventilation. In rela- Plates were developed with 100 μl of TMB Substrate Reagent Set (BD
tion to the WHO scoring17, our clinical scores 1, 2/3, 4 and 5 largely corre- Biosciences 555214) and the reaction was stopped after 12 min by the
spond to WHO scores 3, 4, 5 and 6/7, respectively. Detailed demographic addition of 100ul of 2 N sulfuric acid. Plates were then read at a wave-
information for the entire cohort (98 cohort B patients, and several length of 450 nm and 570 nm.
time-point samples from 54 patients among them) and of cohort A The cut-off values for sero-positivity were determined as 0.392 and
(39 patients) are shown in Extended Data Tables 1–3. For the patients 0.436 for anti-S1-IgG and anti-S1 IgM, respectively. Eighty pre-pandemic
who are 90 years old or older, their ages were protected health informa- plasma samples were assayed to establish the negative baselines, and
tion, and ‘90’ was put as the surrogate value for the analyses. Among these values were statistically determined with confidence level of 99%.
198 total patients enrolled in IMPACT study in this period, we obtained
whole blood, nasopharyngeal swabs or saliva samples from 98 patients Cytokine and chemokine measurement
for the present study. Individuals with active chemotherapy against Patients’ sera isolated as above were stored in −80 °C until the meas-
cancers, pregnant patients, patients with background haematological urement of the cytokines. The sera were shipped to Eve Technologies
abnormalities, patients with autoimmune diseases and patients with a on dry ice, and levels of 71 cytokines and chemokines were measured
history of organ transplantation and on immunosuppressive agents, with Human Cytokine Array/Chemokine Array 71-Plex Panel (HD71).
were excluded from this study. All the samples were measured upon the first thaw.
As a control group, COVID-19-uninfected HCWs from Yale-New Haven The shipment of the samples and measurements were done in two
Hospital were enrolled. HCWs were tested every 2 weeks for PCR and separate batches, but the measurements were performed with the same
serology. For the control group, the PBMCs and plasma analysis were assay kits using the same standard curves, therefore minimizing the
done when both tests were negative. That is, if either or both of these tests batch effects between the measurements.
Article
For the out of range values of the measurements, either the lowest Buffer Set (eBioscience) for 10 min at 4 °C. All further staining cocktails
highest extrapolatable values or the lowest or highest standard curve were made in this buffer. Permeabilized cells were then washed and
were recorded following the instructions of HD71 assay, and included resuspended in a cocktail containing Human TruStan FcX (BioLegend)
in the analyses. Among all the samples measured, we found that two for 10 min at 4 °C. Finally, intracellular staining cocktails were directly
samples had outlier values (beyond 1.5× interquartile range) in more added to each sample for 1 h at 4 °C. After this incubation, cells were
than half of the 71 cytokines or chemokines measured, suggesting the washed and prepared for analysis on an Attune NXT (ThermoFisher).
technical error and/or poor sample qualities in the measurements. Data were analysed using FlowJo software v.10.6 software (Tree Star).
Therefore, cytokine or chemokine data of these individuals were Set of markers used to identify each subset of cells are summarized
excluded from the analyses. in Extended Data Table 7, and gating strategies for the key cell popula-
tions presented in the main figures are shown in Extended Data Fig. 3a–c.
Isolation of PBMCs For most samples, all available staining panels were implemented and
The PBMCs were isolated from heparinized whole blood using Histo- analysed. The few exceptions pertained to those samples during which
paque density gradient under the biosafety level 2+ facility. To isolate a mechanical malfunction occurred, which depleted the sample before
PBMCs, blood 1:1 diluted in PBS was layered over in Histopaque in a acquisition, or to the samples with poor staining qualities. In these cases,
SepMate tube and centrifuged for 10 min at 1,200g. The PBMC layer was data for these samples or panels were missing and not available. All the
collected by quickly pouring the content into a new 50-ml tube. The cells data available were used for the analyses, and the data used to generate
were washed twice with PBS to remove any remaining histopaque and figures and tables can be found in Supplementary Table 1, and the raw fcs
to remove platelets. The pelleted cells were treated with ACK buffer for files are available at ImmPort as described in the 'Data Availability' section.
red cell lysis and then counted. The percentage viability was estimated
using Trypan blue staining. Statistical analysis for the primary analyses
For the primary analyses shown in the main figures, Graph Pad Prism
Flow cytometry (v,8.0) was used for all statistical analysis. Unless otherwise noted,
Using the freshly isolated PBMCs, the staining was performed in three one-way ANOVA with Bonferroni’s multiple comparison test was used
separate panels for (1) PBMC cell composition, (2) T cell surface stain- for the comparisons between M_Pt versus F_Pt, M_Pt versus M_HCW,
ing, and (3) T cell intracellular staining. Exact antibody clones and F_Pt versus F_HCW, and M_HCW versus F_HCW for the comparisons.
vendors that were used for flow cytometric analysis are as follows: For two-group comparisons including the comparison between stabi-
BB515 anti-HLA-DR (G46-6), BV785 anti-CD16 (3G8), PE-Cy7 anti-CD14 lized group and deteriorated group in each sex (Fig. 4a–d), two-sided
(HCD14), BV605 anti-CD3 (UCHT1), BV711 anti-CD19 (SJ25C1), BV421 unpaired t-test was used for the comparison. Bioconductor R (v.3.6.3)
anti-CD11c (3.9), AlexaFluor647 anti-CD1c (L161), Biotin anti-CD141 package ggplot2 (v.3.3.0) was used to generate heat maps (Extended
(M80), PE anti-CD304 (12C2), APCFire750 anti-CD11b (ICRF44), PerCP/ Data Fig. 2), X–Y graphs for correlation analyses (Figs. 2e, 4f), and Pear-
Cy5.5 anti-CD66b (G10F5), BV785 anti-CD4 (SK3), APCFire750 or PE-Cy7 son correlation heat maps (Fig. 4e).
or BV711 anti-CD8 (SK1), BV421 anti-CCR7 (G043H7), AlexaFluor
700 anti-CD45RA (HI100), PE anti-PD1 (EH12.2H7), APC anti-TIM-3 Statistical analysis for the secondary analyses
(F38-2E2), BV711 anti-CD38 (HIT2), BB700 anti-CXCR5 (RF8B2), PE-Cy7 All multivariable analyses were conducted using R v.3.6.1 (for data clean-
anti-CD127 (HIL-7R-M21), PE-CF594 anti-CD25 (BC96), BV711 anti-CD127 ing) and SAS version 9.4 (Cary, NC; for data analysis). The code used
(HIL-7R-M21), BV421 anti-IL-17a (N49-653), AlexaFluor 700 anti-TNF for data cleaning and data analysis is available at https://github.com/
(MAb11), PE or APC/Fire750 anti-IFNy (4S.B3), FITC anti-GranzymeB muhellingson/covid_immresp. We conducted longitudinal analyses of
(GB11), AlexaFluor 647 anti-IL-4 (8D4-8), BB700 anti-CD183/CXCR3 the differences in immune response by sex for patients with COVID-19
(1C6/CXCR3), PE-Cy7 anti-IL-6 (MQ2-13A5), PE anti-IL-2 (5344.111), BV785 and differences in immune response between patients with COVID-19
anti-CD19 (SJ25C1), BV421 anti-CD138 (MI15), AlexaFluor700 anti-CD20 and HCWs by sex and adjusted linear regression to evaluate differences
(2H7), AlexaFluor 647 anti-CD27 (M-T271), PE/Dazzle594 anti-IgD (IA6- in immune response by sex at baseline and the differences in immune
2), PE-Cy7 anti-CD86 (IT2.2), APC/Fire750 anti-IgM (MHM-88), BV605 response by sex and patient trajectory.
anti-CD24 (M1/69), APC/Fire 750 anti-CD10 (HI10a), BV421 anti-CD15
(SSEA-1), AlexaFluor 700 Streptavidin (ThermoFisher). Freshly isolated Longitudinal difference in immune response in all patients positive
PBMCs were plated at 1 × 106–2 × 106 cells in a 96-well U-bottom plate. for COVID-19 (cohort B) by sex. A marginal linear model was fitted
Cells were resuspended in Live/Dead Fixable Aqua (ThermoFisher) for to evaluate the difference in various immune responses (outcome)
20 min at 4 °C. Following a wash, cells were then blocked with Human in patients by sex (exposure). We used an auto-regressive correlation
TruStan FcX (BioLegend) for 10 min at room temperature. Cocktails structure to account for correlation between repeated observations
of desired staining antibodies were directly added to this mixture for in an individual over time. To account for the small sample size and
30 min at room temperature. For secondary stains, cells were washed unequal follow-up between participants, we used the Morel–Bokossa–
and supernatant aspirated; to each cell pellet, a cocktail of secondary Neerchal (MBN) correction. In addition to sex, the model contained
markers was added for 30 min at 4 °C. Before analysis, cells were washed time-independent terms for age (in years) and BMI and time-dependent
and resuspended in 100 μl of 4% paraformaldehyde for 30 min at 4 °C. terms for days from symptom onset (self-reported), ICU status (as a
For intracellular cytokine staining following stimulation, cells were proxy for disease severity) and treatment with either tocilizumab or
resuspended in 200 μl cRPMI (RPMI-1640 supplemented with 10% FBS, corticosteroids. A patient was defined as ‘on tocilizumab’ at a given time
2 mM l-glutamine, 100 U ml−1 penicillin, and 100 mg ml−1 streptomycin, point if they had received the treatment within 14 days before the time
1 mM sodium pyruvate, and 50 μM 2-mercaptoethanol) and stored at the sample was taken. Patients were defined as ‘on corticosteroids’ if
4 °C overnight. Subsequently, these cells were washed and stimulated they had received the treatment on the same day the sample was taken.
with 1× Cell Stimulation Cocktail (eBioscience) in 200 μl cRPMI for 1 h The resulting regression coefficients were interpreted as the difference
at 37 °C. Directly to this, 50 μl of 5× Stimulation Cocktail (plus protein in the adjusted least square means immune response between female
transport inhibitor) (eBioscience) was added for an additional 4 h of and male patients.
incubation at 37 °C. After stimulation, cells were washed and resus-
pended in 100 μl of 4% paraformaldehyde for 30 min at 4 °C. To quan- Difference in immune response between patients with COVID-19
tify intracellular cytokines, these samples were permeabilized with 1× (cohort A) and HCWs by sex at baseline. We used linear regression
Permeabilization Buffer from the FOXP3/Transcription Factor Staining to evaluate the difference in immune response between female and
male patients at the first time point for those patients who had not
received corticosteroids or tocilizumab before enrolment (cohort Reporting summary
A). The model contained terms for sex, patient trajectory (worsened Further information on research design is available in the Nature
versus stable), age, BMI, and an interaction term for sex and group Research Reporting Summary linked to this paper.
(patient versus HCWs). We calculated the least square means for each
group (female patients who worsened, female patients who stabi-
lized, male patients who worsened and male patients who stabilized) Data availability
and evaluated the differences in the least square means of the dif- All of the background information of HCWs, clinical information of
ferent immune response outcomes by group and sex. P values and patients, and raw data used in this study are included in the Supplemen-
95% confidence intervals were calculated with a Tukey correction tary Table 1. In addition, all of the raw fcs files for the flow cytometry
for multiple pairwise comparisons. The regression coefficient of analysis are uploaded in ImmPort (https://www.immport.org/shared/
the interaction term between sex and group was interpreted as the home, study ID: SDY1648).
difference-in-differences of the two comparisons by sex or by group
(for example, the difference-in-differences between female and male 17. WHO. R&D Blueprint Novel Coronavirus COVID-19 Therapeutic Trial Synopsis https://www.
patients and female and male HCWs). who.int/blueprint/priority-diseases/key-action/COVID-19_Treatment_Trial_Design_Master_
Protocol_synopsis_Final_18022020.pdf (2020).
18. Vogels, C. B. F. et al. Analytical sensitivity and efficiency comparisons of SARS-CoV-2
Longitudinal difference in immune response between all patients RT–qPCR primer–probe sets. Nat. Microbiol. 5, 1299–1305 (2020)
with COVID-19 (cohort B) and HCWs by sex. We used a marginal linear 19. Wyllie, A. L. et al. Saliva or nasopharyngeal swab specimens for detection of SARS-CoV-2.
N. Engl. J. Med. https://doi.org/10.1056/NEJMc2016359 (2020).
model with a compound symmetric correlation structure and the MBN
20. Amanat, F. et al. A serological assay to detect SARS-CoV-2 seroconversion in humans.
correction to evaluate the difference in immune responses between Nat. Med. 26, 1033–1036 (2020).
patients and HCWs by sex, controlling for age and BMI. We calculated
the least square means for each group (female patients, female HCWs, Acknowledgements We thank M. Linehan for technical and logistical assistance. This work
male patient, male HCWs) and evaluated the differences in adjusted was supported by the Women’s Health Research at Yale Pilot Project Program (A.I., A.M.R.),
least square means to compare study groups by sex (female patients Fast Grant from Emergent Ventures at the Mercatus Center, Mathers Foundation, the
Beatrice Kleinberg Neuwirth Fund, Yale Institute for Global Health, and the Ludwig Family
versus male patients, female HCWs versus male HCWs, female patients Foundation. IMPACT received support from the Yale COVID-19 Research Resource Fund. A.I.
versus female HCWs and male patients versus male HCWs). P values is an Investigator of the Howard Hughes Medical Institute. CBFV is supported by NWO
Rubicon 019.181EN.004. A.M. is supported by NIH grant R37AI041699.
and 95% confidence intervals were corrected using the Tukey correc-
tion for multiple pairwise comparisons. The regression coefficient of Author contributions A.I., S.B.O. and A.I.K. conceived the study. C.L., P.W., J.K., J. Silva, T.M. and
the interaction term between sex and study group was interpreted as J.E.O. defined parameters for flow cytometry experiments, collected and processed patient
the difference-in-differences between the two comparisons by sex or PBMC samples. P.W. acquired and analysed the flow cytometry data. B.I., J.K., C.L. and C.D.O.
collected epidemiological and clinical data. F.L., A.M., J. Sun, E.Y.W. and A.M.R. acquired and
by group. analysed ELISA data. A.L.W., C.B.F.V., I.M.O., R.E., S.L., P.L., A.V., A.P. and M.T. performed the
virus RNA concentration assays. N.D.G. supervised viral RNA concentration assays. A.C.-M. and
Multivariable patient trajectory analysis. We used linear regression A.J.M. processed and stored patient specimens, J.B.F., C.D.C. and S.F. assisted in patient
recruitment, W.L.S. supervised clinical data management, A.S. coordinated and secured
to evaluate the difference in baseline immune response between pa- funding for PBMC collection. T.T. designed the analysis scheme, analysed and interpreted the
tients who worsened after the baseline sample was taken and those data for the baseline analyses. M.K.E. and S.B.O. designed the analysis scheme, and
interpreted the data for the longitudinal analyses. M.K.E. analysed the longitudinal data. T.T.,
who stabilized by sex. The model contained terms for sex, patient
M.K.E. and A.I. drafted the manuscript. A.I., A.M.R. and S.B.O. revised the manuscript. A.I.
trajectory (worsened versus stable), age, days from symptom onset secured funds and supervised the project. Authors from the Yale IMPACT Research Team
and an interaction term for sex and patient trajectory. We calculated contributed to collection and storage of patient samples, as well as the collection of the
patients’ epidemiological and clinical data.
the adjusted least square means for each group (female patients who
worsened, female patients who stabilized, male patients who wors- Competing interests The authors declare no competing interests.
ened and male patients who stabilized) and evaluated the differences
in least square means of the different immune responses by patient Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
trajectory and sex using the Tukey correction for multiple compari- 2700-3.
sons. The regression coefficient of the interaction term between sex Correspondence and requests for materials should be addressed to A.I.
and patient trajectory was interpreted as the difference-in-differences Peer review information Nature thanks Petter Brodin, Malik Peiris and the other, anonymous,
reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are
between the two patient trajectories by sex or sex by the two patient available.
trajectories. Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Comparison of basic clinical parameters of cohort A chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and F_Pt,
patient samples and plasma levels of 71 cytokines and chemokines at the respectively. Data are mean ± s.e.m. P values were determined by unpaired
first sampling of cohort A. a, Comparisons of age, BMI and DFSO at the first two-tailed t-test (a) or one-way ANOVA with Bonferroni multiple comparison
sampling between male and female patients in cohort A. n = 17 and 22 for M_Pt test (b). All P values < 0.10 are shown.
and F_Pt, respectively. b, Comparison of the plasma levels of 71 cytokines and
Extended Data Fig. 2 | Heat maps of cytokines and chemokines, PBMC c, A heat map for the T cell subsets (percentage in CD3+ cells). n = 6, 45, 16
composition, T cell subsets, and T cell cytokine expression at the first and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. d, A heat map for the
sampling of cohort A patients. a, A heat map of the plasma levels (pg ml−1) of intracellular cytokine staining of T cells (percentage in CD3+ cells). n = 6, 43, 16
71 cytokines and chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. In all of these heat
F_Pt, respectively. b, A heat map for the composition of PBMCs (percentage in maps, log-transformed values were used for heat map generation.
live PBMCs). n = 6, 42, 16 and 21 for M_HCW, F_HCW, M_Pt and F_Pt, respectively.
Article
Extended Data Fig. 3 | Flow cytometry gating strategy. a–c, Gating strategy used for monocytes (a), CD38+HLA-DR+ and PD-1+TIM-3+ CD4 or CD8 T cells (b), and
T cell intracellular staining for IFNγ+ CD8 T cells (c).
Extended Data Table 1 | Demographic and clinical characteristics of cohort A, cohort B and HCW comparison groups
*Cells may not sum to total owing to missing data.

†Categories not mutually exclusive.
‡Grey areas indicate that data are not available or not applicable.
Article
Extended Data Table 2 | Background and sample information of 39 cohort A patients
*Ethnicity: (1) American Indian/Alaskan native; (2) Asian; (3) Black/African American; (4) Native Hawaiian/Pacific Islander; (5) White; (6) Hispanic; (9) Multiple; (98) Unknown/unavailable.
†COVID-related risk factors: (0) No; (1) cancer treatment within 1 year; (2) chronic heart disease; (3) hypertension; (4) chronic lung disease (asthma, chronic obstructive pulmonary disease
(COPD) and interstitial lung disease (ILD)); (5) immunosuppression.
‡C1, clinical score at the first sample collection date.
§Days from symptom onset at the first sample collection.
||Cmax, maximum clinical score during the admission after the first time point sample collection.
¶Days from symptom onset at the first day Cmax was recorded in deteriorated patients.
#Collected sample or data types at the first sample collection date.
E, plasma cytokine/chemokine ELISA; F1, flow cytometry PBMC cell composition staining; F2, flow cytometry T cell surface staining; F3, flow cytometry T cell intracellular staining; G, plasma
anti-S1-IgG; M, plasma anti-S1-IgM; N, nasopharyngeal viral load; S, saliva viral load.
Extended Data Table 3 | Adjusted least square means difference in immune response at baseline between male and female
patients with COVID-19 in cohort A and male and female HCW controls
*Adjusted for age and BMI.

†A450 nm; nPT_F = 20, nPt_M = 15, nHCW_F = 74, nHCW_M = 13.
‡A450 nm; nPT_F = 20, nPt_M = 15, nHCW_F = 18, nHCW_M = 3.
§log10(pg ml−1); nPT_F = 19, nPt_M = 16, nHCW_F = 28, nHCW_M = 15.
||As percentage of live cells, unless otherwise indicated; nPT_F = 21, nPt_M = 16, nHCW_F = 51, nHCW_M = 6.
¶nPT_F = 33, nPt_M = 40, nHCW_F = 51, nHCW_M = 6.
#As a percentage of CD3-positive cells; nPT_F = 21, nPt_M = 16, nHCW_F = 51, nHCW_M = 6.
P values were determined using two-sided t-test with Tukey correction for multiple pairwise comparisons.
Article
Extended Data Table 4 | Adjusted least square means difference over time in immune response between male and female
patients with COVID-19 in cohort B
*Adjusted for age, BMI, days from symptom onset, tocilizumab treatment, corticosteroid treatment and ICU status.
†log10(SARS-CoV-2 copies per ml); nasopharyngeal nPT_F = 33, nPt_M = 30; saliva nPT_F = 20, nPt_M = 18.
‡
OD450; nPT_F = 44, nPt_M = 39. §log10(pg ml−1); nPT_F = 48, nPt_M = 43.
||As a percentage of live cells, unless indicated otherwise; nPT_F = 46, nPt_M = 42.
¶
nPT_F = 33, nPt_M = 40.
#As a percentage of CD3-positive cells; nPT_F = 49, nPt_M = 42.
P values were determined using two-sided t-test and Morel–Bokossa–Neerchal correction.
Extended Data Table 5 | Adjusted least square means difference over time in immune response between male and female
patients with COVID-19 in cohort B and male and female healthy HCW controls
*Adjusted for age and BMI.

†A450 nm; nPT_F = 44, nPt_M = 39, nHCW_F = 74, nHCW_M = 13.
‡OD450; nPT_F = 44, nPt_M = 39, nHCW_F = 18, NHCW_M = 3.
§log10(pg ml−1); nPT_F = 48, nPt_M = 43, nHCW_F = 28, nHCW_M = 15.
||As a percentage of live cells, unless otherwise indicated; nPT_F = 46, nPt_M = 42, nHCW_F = 51, nHCW_M = 6.
¶nPT_F = 33, nPt_M = 40, nHCW_F = 51, nHCW_M = 6.
#As a percentage of CD3-positive cells; nPT_F = 49, nPt_M = 42, nHCW_F = 51, nHCW_M = 6.
P values were determined using two-sided t-test with Tukey correction for multiple pairwise comparisons.
Article
Extended Data Table 6 | Adjusted least square means difference between male and female patients with COVID-19 in cohort
A by patient trajectory
*Adjusted for age and days from symptom onset.

†As percentage of CD3-positive cells unless otherwise indicated. nF_deteriorated = 6, nM_deteriorated = 6, nF_stabilized = 16, nM_stabilized = 11.
‡As a percentage of live cells.
§log10(pg ml−1).
P values were determined using two-sided t-test with Tukey correction for multiple comparisons.
Extended Data Table 7 | Definitions of each cell subset in flow cytometry with specific markers
12345656762589
55653 17
!"#$%&'( A55j1""5
)"$#"$*
*+"
+ #$%&'(A#!#$kk9lmlm
,/"$#,"$0%1%!- # .." +
$$.2$%#0*3$+
4$%15$%"$1
1#*3%67%4.2$#0$#40$0+"$""0+
$!684#$%4."$
/"
$#,"0%309#:$"3;30"$%:$"3;30+%053$6
7425
-$"$$0
8"33$"$$0"3""3+904.$%"$$%4331!$."$ $%4!#3!9$"*33!9."$<$9=$%0$6
>" 4.
n 7%<"0$".3 ?&@'4"0%<.$"3!#>0$9!2" ""
0$#.*"#$4."#.$
n A$"$.$1% $%."#.$1$"54.$0$".3 1%$%$%".".31"."#"$3+
n7 %$"$$0"3$$&'#A/B1%$%$%+"C$1C
D@EFGHIJJI@GKLMKMGMNIOEPGQLGPLMHRSQLPGMIELEFGQFG@TJLUGPLMHRSQLGJIRLGHIJVELWGKLHN@SXOLMGS@GKNLGYLKNIPMGMLHKSI@Z
n A0$ 4"3302""$$$
n A0$ 4"+"#.$ 00$9#0%" "$
$ 4."3$+""[#$.$4.#3$30."
n A4 #330$4$%$"$$0"3"".$03#!0$"3$0+&6!6."'$%*"0$."$&6!6!0440$'
A/B2""$&6!6$""2"$'"0"$$."$ 4#0$"$+&6!6040$2"3'
n 8#3 3%+$%$$!9$%$$$"$$0&6!6\9K9R'1$%040$2"39440$?9!
^S_LG]G_TEOLMGTMGLWTHKG_TEOLMG̀NL@L_LRGMOSKTQELZ
44."]2"3#$
n 8a"+"""3+94."$ $
%0%0 4"="520%"=$"3$$!
n 8%"0%0"3"0.3<!9$40"$ 4$%""$3234$$"4#33$!
4#$0.
n :$."$4440$?&6!6%bP9;"bR'90"$!%1$%+10"30#3"$
DORG̀LQGHIEELHKSI@GI@GMKTKSMKSHMGcIRGQSIEIdSMKMGHI@KTS@MGTRKSHELMGI@GJT@FGIcGKNLGVIS@KMGTQI_LZ
-4$1""0
;30+4."$"*#$"2"3"*3$+
40.#$0
B"$"0330$ :;j:g,4$1"&$0$2:=,21"030"3"$""!!!"$'",:B"o6p6q&030"3"$""!!!"$'6
B"$"""3+ ,&2p6q6p'9!!3$l&2p6p6m'9f"%";.&2r'9-A-&2o6s'9A$$#/<7&2p6k6l'9831t&2km6q97
-$"'6
8."#0$#$3?!0#$."3!$%.4$1"$%"$"0$"3$$
%"0%*#$$+$0*#*3%3$"$#94$1".#$*."
"2"3"*3$$
$"
216e
e$!3+0#"!0$ "0..#$+$+&6!6f$g#*'6-$%/"$#,"0%!#34#*.$$!0h4$1"44#$%4."$6
B"$"
;30+4."$"*#$"2"3"*3$+
4"$"
A33."#0$.#$03#""$""2"3"*3$+$"$.$7%$"$.$%#32$%4331!4."$91%"30"*3(
CA0009#i#$491*354#*303+"2"3"*3"$"$
CA3$44!#$%"$%"2"0"$"1"$"
CA0$4"+$0$
"$""2"3"*3$+

A33$%4310+$.$+"$"133*"
2"3"*32"j..;$1*$&$#+jB(-Bukqsr'6A33$%"$"&2"33"9"$*+$$9:)j-A"$"'#$$!
"$4!#"
$"*3$%$#+"03# $%-#3.$"+j4."$7"*3kk6
0
8;3"33C0$$%0*431$0 $ !
12345656762589
5653 17425
%"$$%*$4$4+#"0%6j4+#"$#9"$%""$0$*4."5!+#30$6
n )400 a%"2#"3h0"300 :03!0"3923#$"+h2.$"300
8"400+4$%0#.$1$%"330$9"$#60.>0#.$>C$!C#.."+C43"$64
)A334$ 0 0 $# + !
#.#$03$%$21%$%03#!"$26
-".3? /$"$$0"3.$%1#$0"30#3"$$%".3?6-".3?1"$.*"$%#.*4"$$".$$$u"3C
/1g"2g$"3&u/gg'*$1="0%kr$%"="+o$%$%"$133"0$1$%$%0#$$#+#j,a"gj
"2$03wlmmmmlxqom6;"$$1$4$%#!%0!4:=,04$$"333.$6j2#"31$%"0$2
0%.$%"+"!"$0"09!"$"$$9"$$1$%*"05!#%."$3!0"3"*."3$9"$$1$%"#$..#"
&6!6%#."$"$%$'9""$$1$%"%$+4!"$"3"$"$"..##2"!$1$03#9"
$$"3or"$$103#$%$#+6j4.0$1"*$"*+$"$"44"".30330$0..0.."$3+
#$#+33.$630"30.10330$"<."$3+2+s"+1%"2#"3y030"3$"$#.$$9"1"
0$##$3"$$0%"!<"$6
B"$"<03# 71<$.#$3".340+$5:)j-A$%$$#+6="#.$4.$%2#"31#$3&*+k6z<$%
$i#"$3"!'.$%"%"344$%0+$5."#67%$!3+#!!$$%"$"$0%0"300##!$%$1
<.$".3i#"3$+6)51944310+$.$+9"$".+3"3"7033#4"0$"!"34$12#"3
1#$3.$%"%"344$%"".$."#91%0%#!!$$%$"!i#"3$+#!$%<.$67%#9"$"4
$%".31<03#4.$%""3+6
,30"$ 7%."#.$1$30"$C3!$#"3""3+4".34.%#."2#"36
,".?"$ 7%$32"$"$%"*2"$"3$#+6
a3! A$$%$.4".3"0i#$"0!90$$10.3$3+#"1"4$%"$$y0$6a3"0i#$
4."0*+"""$$".6j4."$4"$$y0$"$"2"3"*3#$3"4$0!"""3+!"1
"$"*+4310+$.$+":)j-A6A030"3$".9""$4.$%<.$"3$".94.0%"$21$$."$$y
32"$$"$$06+$5"4"0""3+1*36;"$$030"34."$"030"300!13+2"3"4$
"$"0330$6
,ei#4$."$!4 040."$"39+$.".$%
4."#$%"*#$.$+4."$"39<.$"3+$.".$%#."+$#6g90"$1%$%"0%."$"39
+$..$%3$32"$$+#$#+6j4+#"$#4"3$$."3$+#"0%9"$%""$0$*430$!"6
="$"3h<.$"3+$. =$%
>" j232$%$#+ >" j232$%$#+
n A$* n %j;Ci
n :#5"+$00333 n 8310+$.$+
n ;"3"$3!+""0%"3!+ n =,jC*"#."!!
n A."3"$%!".
n g#.""0%"$0"$
n 30"3"$"
n B#"3#"0%400
A$*
A$*# A33"$*#$%$#+""!"$%#."$6aazkz"$C%g)ACB,&fsqCq'&k(smm'&aBa00'9a{xrz"$%Bkq
&pfr'&k(kmm'&a)!'9;:C+x"$C%Bks&gBks'&k(pmm'&a)!'9a{qmz"$C%Bp&|g7k'&k(pmm'&a)!'9

a{xkk"$C%Bko&-tlzk'&k(pmm'&aBa00'9A3<"83#qsx"$C%Bk0&)kqk'&k(kzm'&a)!'9a$"$C%Bksk&=rm'
&k(kzm'&a)!'9;:CB"??3zos"$C%Bzq&gBzq'&k(pmm'&a)!'9;:"$C%Bpms&kll'&k(pmm'&a)!'9A;8xzm
"$C%Bkk*&j,8ss'&k(kmm'&a)!'9;;>+z6z"$C%Bqq*&fkm8z'&k(lmm'&aBa00'9a{xrz"$C%Bs&-}p'&k(lmm'
&a)!'9A;8xzm;:C+xa{xkk"$C%Br&-}k'&k(lmm'&a)!'9a{slk"$C%,x&fmspgx'&k(zm'&a)!'9
A3<"83#xmm"$C%Bsz,A&gjkmm'&k(lmm'&aBa00'9;:"$C%;Bk&:gkl6lgx'&k(lmm'&a)!'9A;"$C%7j=p
&8prCl:l'&k(zm'&a)!'9a{xkk"$C%Bpr&gj7l'&k(lmm'&a)!'9aaxmm"$C%~,z&,8ral'&k(zm'&aBa00'9
;:+x"$C%Bklx&gj)Cx,C=lk'&k(zm'&a)!'9;:C8zos"$C%Blz&aoq'&k(lmm'&aBa00'9a{xkk"$C%Bklx
v
&gj)Cx,C=lk'&k(zm'&aBa00'9a{slk"$C%j)kx"&/soCqzp'&k(kmm'&aBa00'9A3<"83#xmm"$C%7/8"&=A*kk'
&k(kmm'&a)!'9;:A;>8xzm"$C%j8/+&s-6ap'&k(qm'&a)!'98j7"$C%f"?+.a&fakk'&k(lmm'&a)!'9
A3<"83#qsx"$C%j)Cs&rBsCr'&k(kmm'&a)!'9aaxmm"$C%Bkrp>~,p&kq>~,p'&k(kmm'&aBa00'9;:C+x
12345656762589
5653 17425
"$%j)Cq&=lCkpAz'&k(zm'&a)!'9;:"$C%j)Cl&zpss6kkk'&k(zm'&aBa00'9a{xrz"$C%Bko&-tlzk'&k(pmm'
&a)!'9a{slk"$C%Bkpr&=jkz'&k(pmm'&a)!'9A3<"83#xmm"$C%Blm&lgx'&k(lmm'&a)!'9A3<"83#qsx
"$C%Blx&=C7lxk'&k(pzm'&a)!'9;:>B"??3zos"$C%j!B&jAqCl'&k(smm'&a)!'9;:C+x"$C%Brq&j7l6l'&k(kmm'
&a)!'9A;>8xzm"$C%j!=&=g=Crr'&k(lzm'&a)!'9a{qmz"$C%Bls&=)z'&k(lmm'&a)!'9a{slk"$C%Bkm
&gjkm"'&k(lmm'&a)!'9a{slk"$CB%kz&--:ACk'&k(lmm'&a)!'9A3<"83#xmm-$$"2&k(pmm'&7%.8%'9
a{qmz-$$"2&k(pmm'&a)!'6
{"3"$ A33"$*#$%$#+"0..0"33+"2"3"*39""33%"2*2"3"$*+$%."#4"0$#"#*+$%
#*30"$6)5191$$"$$%"$*"00!$#1#$"!0$67%4331!12"3"$$%
4331!0(aazkz"$C%g)ACB,&fsqCq'&aBa00'&g#."9,%#9+.3!#9a"*'9a{xrz"$C%Bkq&pfr'
&a)!'&g#."9A40"f9a"*9"#0%=5+9%."?9+.3!#9=".$9;!$"3="0"i#9,%#9
-$+="!"*+9-i#3=5+'9;:C+x"$C%Bks&gBks'&a)!'&g#."'9a{qmz"$C%Bp&|g7k'&a)!'
&g#."9%."?'9a{xkk"$C%Bko&-tlzk'&aBa00'&g#."'9A3<"83#qsx"$C%Bk0&)kqk'&a)!'&g#."9
A40"f9a"*9+.3!#9,%#'9a$"$C%Bksk&=rm'&a)!'&g#."9A40"f9a"*'9;:CB"??3zos
"$C%Bzq&gBzq'&a)!'&g#."9A40"f9a"*9+.3!#9,%#'9;:"$C%Bpms&kll'&a)!'
&g#."'9A;8xzm"$C%Bkk*&j,8ss'&a)!'&g#."9A40"f9a"*9%."?9..=".$9
+.3!#9,%#9-1'9;;>+z6z"$C%Bqq*&fkm8z'&aBa00'&g#."'9a{xrz"$C%Bs&-}p'&a)!'
&g#."'9A;8xzm;:C+xa{xkk"$C%Br&-}k'&a)!'&g#."9C,"0$2$+(A40"f9%."?9
+.3!#9;!$"3="0"i#9,%#9-$+="!"*+'9a{slk"$C%,x&fmspgx'&a)!'&g#."9A40"f9
a"*9+.3!#9,%#'9A3<"83#xmm"$C%Bsz,A&gjkmm'&aBa00'&g#."'9;:"$C%;Bk&:gkl6lgx'
&a)!'&g#."9A40"f9a"*9%."?9..=".$9+.3!#9,%#9-i#3=5+'9A;
"$%7j=p&8prCl:l'&a)!'&g#."'9a{xkk"$C%Bpr&gj7l'&a)!'&g#."9%."?9g'9aaxmm"$C%~,z
&,8ral'&aBa00'&g#."'9;:C+x"$C%Bklx&gj)Cx,C=lk'&a)!'&g#."'9;:C8zos"$C%Blz&aoq'&aB
a00'&g#."9,%#9+.3!#9a"*'9a{xkk"$C%Bklx&gj)Cx,C=lk'&aBa00'&g#."'9a{slk"$C%j)Ckx"
&/soCqzp'&aBa00'&g#."'9A3<"83#xmm"$C%7/8"&=A*kk'&a)!'&g#."9"$9C,"0$2$+(%."?9
a"*9+.3!#9,%#9;!$"3="0"i#9-$+="!"*+9-1'9;:A;>8xzm"$C%j8/+&s-6ap'&a)!'
&g#."9C,"0$2$+(%."?9a"*9+.3!#9,%#'98j7"$C%f"?+.a&fakk'&a)!'&g#."9=#9
C,"0$2$+(,"$'9A3<"83#qsx"$C%j)Cs&rBsCr'&a)!'&g#."9C,"0$2$+(%."?9a"*9+.3!#9
,%#'9aaxmm"$C%Bkrp>~,p&kq>~,p'&aBa00'&g#."9,%#9+.3!#9a"*'9;:C+x"$Cj)Cq
&=lCkpAz'&a)!'&g#."'9;:"$C%j)Cl&zpss6kkk'&aBa00'&g#."'9a{xrz"$C%Bko&-tlzk'&a)!'
&g#."'9a{slk"$C%Bkpr&=jkz'&a)!'&g#."'9A3<"83#xmm"$C%Blm&lgx'&a)!'&g#."9a"*9"#0%
=5+9%."?9+.3!#9;!$"3="0"i#9,%#9-i#3=5+'9A3<"83#qsx"$C%Blx&=C7lxk'&a)!'
&g#."9C,"0$2$+(a"*9+.3!#9,%#'9;:>B"??3zos"$C%j!B&jAqCl'&a)!'&g#."'9;:C+x"$C%Brq
&j7l6l'&a)!'&g#."9A40"f9a"*9"#0%=5+9..=".$9$$C$7"."9%."?9
+.3!#9,%#'9A;>8xzm"$C%j!=&=g=Crr'&a)!'&g#."9A40"f9a"*9+.3!#9,%#'9a{qmz
"$C%Bls&=)z'&a)!'&g#."9C,"0$2$+(%."?'9a{slk"$C%Bkm&gjkm"'&a)!'&g#."9A40"
f9a"*9"#0%.5+9%."?9+.3!#9,%#'9a{slk"$C%Bkz&--:ACk'&a)!'&g#."'9A3<"83#
xmm-$$"2&k(pmm'&7%.8%'9a{qmz-$$"2&k(pmm'&a)!'6
g#.""0%"$0"$
;30+4."$"*#$$#232!%#.""0%"$0"$
;#3"$0%""0$$0 84$+C4."3&"!qs6mkq6o'"sx."3&"!qk6okq6x'"$$103#67%$"3.!"%04."$
0"*4#:<$B"$"7"*3k6
,0#$.$ ;"$$".$$$$%u"3/1g"2g$"3&u/gg'*$1$%kr$%4="0%$%#!%$%o$%4="+lmlm91
0#$$$%u"3j=;A7$#+&j.3.$!=0"3";#*30g"3$%A0$A!"$"2#7'"4$$$!
$24-A,-C{l*+i,7C;,6&3!+1"4#$%04.4"33"$$33'6;"$$1$4
$%#!%0!4:=,04$$"333.$1$%3430$6j4.0$1"*$"*+$"
$"44"".30330$0..0.."$3+#$#+33.$630"30.10330$
"<."$3+2+s"+1%"2#"3y030"3$"$#.$$9"1"0$##$3"$$0%"!
<"$6
:$%02!%$ u"3g#.","0%;$0$;!".j$$#$"3,21a"6j4.0$1*$"4."3333
"$$"%"3$%0"156#"0%$031"21""2*+$%u"3-0%34=0j,a"
gj&wlmmmmlxqom'6j4.0$1"*$"*+$"$"44"0."$"#"0%"$"*"4$%
#"$4#$#+67%1.03#$%$#+6

/$$%"$4#334."$$%"2"34$%$#+$03.#$"3*2$%."#0$6

831+$.$+
12345656762589
55653 17
;3$
4.$%"$(
n 7%"<3 "*3$"$$%."5"43#0%.#&6!6BsC8j7'6
n 7%"<0"3 "03"3+2*36j03##.*"3!"<3+4*$$.34$3$4!#&
&"b
"!#b
"
""
"3+
4$0"3."5'6
n A333$"0$#3$1$%#$3 #033$6
n A#.0"32"3 #4#.*4033 0$"!&1$%$"$$0'
26
7425
=$%3!+
-".3""$ 8%3+3"$;a=1$"432""."59*3051$%g#."7#-$"80~9$"4#4"0
."5"$%4<1$%;8As68$"033#3"0+$5$"!4331!$.#3"$90331#4"0$"9
1"%"4<ss; 8A6A4$."*3?"$1$%kk~;~ ."*3?"$a#440331$"4$"033#3"
0+$5""3+6
j$#.$ 331"0i#" "A
$$#/~7&7%.8%'6
-4$1" B"$"1""3+#!831t4$1"2km6q4$1"&7-$"'6
33#3"$"*#"0 33#3"$"*#"0(33#3"$1$2"#4."$03#!""" #.*00$"$ 4$%
"$$y*3".3&<kmq033>.)'9""
$4329!3;a=&&4)2'9""
$4""$!"$& &
4Bs703394=0+$9$06'67%4#33!"$!"$%403"40"$03#$%<$4!#6
f"$!$"$!+ --CA"8-CA"".$1#$$ 30$3#50+$4. 3"$;a=6)2""03314*"
"i#"$"!6-!3$1""$*"- ->8-"".$6)#50+$1!"$*"$
$$4+
3+.%0+$&Bp>Bs>Br>Bko>Bzq."5'9!"#30+$&Bkq9Bks9g)ACB,."5'"B9"0B&Bpms9
Bk09Bksk'67,C"0$2"$703397."33+C44$"$70339""$"3#*$614#!g)ACB,9
Bpr9,x9Bklx9;Bk97j=Cp9~,z9Bsz,A9Blz6j$"033#3"7033!"$!$"$!+$$$4+Bs">Br7033
0$!7/8"9j8/C+9j)Cq9j)Cl9f"?+.a9j)Cs9">j)Ckx14#!$%04."5(Bp9Bs9Br97/89
j8/9j)Cq9j)Cl9j)Cs9j)Ckx"!"?+.aa6
n 705$%*<$
$0
4.$%"$"4!#<.34+!$%!"$!$"$!+
2
$%-#3.$"+j4."$6

Article
Tunable dynamics of B cell selection in gut

germinal centres
https://doi.org/10.1038/s41586-020-2865-9 Carla R. Nowosad1,2,4, Luka Mesin1,4, Tiago B. R. Castro1,2, Christopher Wichmann2,3,

Gregory P. Donaldson2, Tatsuya Araki1, Ariën Schiepers1, Ainsley A. K. Lockhart2,
Angelina M. Bilate2, Daniel Mucida2 ✉ & Gabriel D. Victora1 ✉
Accepted: 29 July 2020
Published online: 28 October 2020

Germinal centres, the structures in which B cells evolve to produce antibodies with
Check for updates high affinity for various antigens, usually form transiently in lymphoid organs in
response to infection or immunization. In lymphoid organs associated with the gut,
however, germinal centres are chronically present. These gut-associated germinal
centres can support targeted antibody responses to gut infections and immunization1.
But whether B cell selection and antibody affinity maturation take place in the face of
the chronic and diverse antigenic stimulation characteristic of these structures under
steady state is less clear2–8. Here, by combining multicolour ‘Brainbow’ cell-fate
mapping and sequencing of immunoglobulin genes from single cells, we find that
5–10% of gut-associated germinal centres from specific-pathogen-free (SPF) mice
contain highly dominant ‘winner’ B cell clones at steady state, despite rapid turnover
of germinal-centre B cells. Monoclonal antibodies derived from these clones show
increased binding, compared with their unmutated precursors, to commensal
bacteria, consistent with antigen-driven selection. The frequency of highly selected
gut-associated germinal centres is markedly higher in germ-free than in SPF mice, and
winner B cells in germ-free germinal centres are enriched in ‘public’ clonotypes found
in multiple individuals, indicating strong selection of B cell antigen receptors even in
the absence of microbiota. Colonization of germ-free mice with a defined microbial
consortium (Oligo-MM12) does not eliminate germ-free-associated clonotypes, yet
does induce a concomitant commensal-specific B cell response with the hallmarks of
antigen-driven selection. Thus, positive selection of B cells can take place in
steady-state gut-associated germinal centres, at a rate that is tunable over a wide
range by the presence and composition of the microbiota.
Our intestines are constantly exposed to large amounts of antigens maturation occur in the midst of chronic antigenic stimulation, and
derived from diet and commensal microbes. The interaction of these to define the impact of the microbiota on these processes.
antigens with the immune system takes place primarily in gut-associated To estimate the rate of B cell selection in steady-state gaGCs, we first
secondary lymphoid structures, including gut-draining mesenteric used in situ photoactivation of mice engineered to express photoac-
lymph nodes (mLNs) and Peyer’s patches, where gut-associated ger- tivatable green fluorescent protein (PA-GFP)11,12 (Fig. 1a and Extended
minal centres (gaGCs) provide a site for the hypermutation of immu- Data Fig. 1a) to sequence B cell immunoglobulin heavy chain genes
noglobulin genes even under steady state5,9. B cell antigen receptor (Igh) from 20 individual gaGCs from various mLNs of 5 mice housed
(BCR)-driven selection and affinity maturation of antibodies occur under SPF conditions. Clonal diversity in SPF gaGCs spanned a wide
efficiently in gaGCs upon oral immunization6,10. However, given previ- range, with a median of 33 clones per germinal centre (using the Chao1
ous reports that steady-state gaGCs can form in a BCR-independent estimator function), a D50 value (the fraction of clones accounting for
fashion2,6, show little evidence of BCR-driven selection of specific 50% of sequenced cells) of 0.20, and 30% of B cells belonging to the
antibodies at the sequence level3, and are associated instead with the largest clone in the germinal centre (Fig. 1b, c). One of the 20 samples
selection of polyreactive immunoglobulins4, it has been postulated that sequenced contained a highly dominant clone that accounted for 64%
gaGCs may act predominantly as diversifiers of the immunoglobulin of cells in that germinal centre (Fig. 1b, c). Analysis of somatic muta-
repertoire, rather than fostering affinity maturation towards com- tions within this clone (Fig. 1d) showed the nested expansion of nodes
mensal microbes (reviewed in refs. 5,8). We thus sought to determine with increasing numbers of mutations (indicated by arrows) typical of
the extent to which germinal-centre selection and antibody affinity sequential positive selection.
Laboratory of Lymphocyte Dynamics, The Rockefeller University, New York, NY, USA. 2Laboratory of Mucosal Immunology, The Rockefeller University, New York, NY, USA. 3Mucosal Immunology
1
Group, Department of Pediatrics, University Medical Center Rostock, Rostock, Germany. 4These authors contributed equally: Carla R. Nowosad, Luka Mesin. ✉e-mail: mucida@rockefeller.edu;
victora@rockefeller.edu

Article
a b c Clonal Clonal Size of
SPF1 SPF2 SPF3 SPF4 SPF5 richness diversity largest clone
PA–GFP Photoactivate Dissect
germinal centre (Chao1)

SPF hi J J I I CC CC I I CC I I CC I I CC 103 0.4 100
germinal centre (%)

90
mLN
Number of cells sequenced

0.3 80
80
Cells in the
Clones per
102
70 60
D50
0.2
60 40
101
50 0.1
FDCs (anti-CD35-Cy3) 20
40
Sort PA+ cells 30 100 0 0
Igh sequencing
20
d UA
10
Inferred precursor 1 mutation
0 n n cells with same sequence
Dominant clone Expanded clones Singletons
e Steady-state gaGCs 2
3
Δt 2 2
AicdaCreERT2/+.Rosa26Confetti/Confetti 3
Tamoxifen (2×)
f Caecal–colonic mLN, day 7 post-tamoxifen Caecal–colonic mLN, day 15 post-tamoxifen Caecal–colonic mLN, day 21 post-tamoxifen
L L L
L
L LL
L
LL LL LL LL
LL
g Dominance × density
germinal-centre B cells (%)

Coloured cell density Colour dominance h
1.0 100 1.0 100
Half-life 1.9 weeks
Cells per 100 μm2
0.8 80 0.8 80 (95% CI 1.1–3.8 weeks)
Fate-mapped
Fluorescent
B cells (%)
0.6 60 0.6 60
NDS
0.4 40 0.4 40
0.2 20 0.2 20
0.0 0 0.0 0
0 5 10 15 20
14 7
21 15
3
35
42
49
56
14 7
21 15
3
35
42
49
56
14 7
21 5
3
35
42
49
56
–2
–2
–1
–2
–
Weeks post-tamoxifen
Days post-tamoxifen Days post-tamoxifen Days post-tamoxifen
Fig. 1 | Kinetics of clonal selection in steady-state gaGCs. a, Experimental normalization for coloured cell density). Scale bars represent 200 μm in main
setup for in situ photoactivation of gaGCs. mLNs from photoactivatable (PA), images and 50 μm in close-ups. g, Quantification of multiple images as in f. Left,
GFP-transgenic, SPF mice are photoactivated and dissected; photoactivated density of coloured cells (that is, fluorescent cells in the germinal-centre dark
(PA+) cells are sorted and Igh genes sequenced. FDC, follicular dendritic cell, zone). Centre, colour dominance (that is, frequency of the most-common
visualized with an antibody against Cy3-labelled CD35. b, Clonal composition colour). Right, NDS (excludes germinal centres with density < 0.4). Each symbol
of individual germinal centres from five mice (SPF1–5), obtained as in a. represents one germinal centre. Only germinal centres with a density of more
J, jejunal mLN; I, ileal mLN; C, caecal-colonic mLN. c, Quantification of data in than 0.4 fluorescent cells per 100 μm2 are included in the NDS calculation.
b. Each symbol represents one germinal centre; medians indicated by centre Filled symbols indicate data from two mice in which tamoxifen was
lines. d, Relationships between Igh sequences of B cells derived from the administered intraperitoneally. Medians are indicated. Data are from 3–5 mice
largest B cell clone in b. Arrows indicate putative positive selection events. per time point at days 14–35, and 1 to 2 mice for day 7 and later time points.
UA, unmutated ancestor. e, Experimental setup for ‘Brainbow’ fate-mapping. h, SPF S1pr2CreERT2 × Rosa26Stop-tdTomato mice were given two doses of tamoxifen,
Steady-state gaGCs from AicdaCreERT2/+.Rosa26Confetti/Confetti mice are recombined two days apart, to label germinal-centre B cells. The graph shows the proportion
to produce different colours by treatment with tamoxifen and followed over of tdTomato+ mLN B cells assayed by flow cytometry at the time points
time (Δt). f, Representative multiphoton images of mLNs from SPF mice at indicated, counting from the first dose of tamoxifen. Each symbol represents
different times after tamoxifen treatment. Blue is collagen (second harmonics); one mouse. Half-lives were quantified using a one-phase exponential decay
white is autofluorescence; other colours are from the Confetti allele. Numbers function (black line). CI, confidence interval. Gating strategies are detailed in
in parentheses represent normalized dominance scores (NDS; that is, the Extended Data Fig. 1a–c.
frequency of B cells in a germinal centre that carry the dominant colour after
This was confirmed in a much larger number of germinal centres progressively after labelling, probably as a result of germinal-centre B
using Brainbow13 multicolour fate-mapping11 (see Supplementary Infor- cells being replaced by incoming unlabelled clones as the response
mation). We fate-mapped steady-state gaGC B cells in AicdaCreERT2/+. evolved (Fig. 1f, g). Because density estimations using Brainbow are sen-
Rosa26Confetti/Confetti (AID-Confetti) mice held under SPF conditions by admin- sitive to dropout of low-fluorescence germinal centres, we measured
istering two doses of tamoxifen, two days apart (Fig. 1e). This labelled germinal-centre turnover by flow cytometry using the S1pr2CreERT2 BAC
roughly 50% of B cells in both mLNs and Peyer’s patches, as estimated by transgene14 crossed to the Rosa26Stop-tdTomato reporter. Pulse-labelling using
the density of coloured cells in the germinal centres and by flow-cytometry this model allowed us to place the half-life of gaGC B cells at roughly two
experiments (Fig. 1f, g and Extended Data Fig. 1b). This fraction decreased weeks under SPF conditions (Fig. 1h and Extended Data Fig. 1c).

a UA S120.U
2
(0.76)
5
mLN
88/88 2
12
UA
S120
S078.U 6
4
(0.83) 7 S078 11
2
mLN 2 3
32/44 YFP n n cells with
RFP same sequence
CFP + RFP Inferred precursor
1 mutation
b Secondary only MG053 (negative control) S078 (mutated) S078.U (reverted) S078 (P = 0.016)
antibody binding (%)

105 105 105 105 0.15
0 0.002 0.049 0.004
Monoclonal
104 104 104 104
0.10
103 103 103 103
0.05
102 102 102 102
0 0 0 0
0.00
0102 103 104 105 0102 103 104 105 0102 103 104 105 0102 103 104 105 M U
Secondary only G200 (negative control) S120 (mutated) S120.U (reverted) S120 (P = 0.016)
8
antibody binding (%)

105 105 105 105
0.46 0.96 6.22 1.21
6
Monoclonal
104 104 104 104
103 103 103 103 4

FSC
102 102 102 102 2

0 0 0 0
0
0 103 104 105 0 103 104 105 0 103 104 105 0 103 104 105 M U
Anti-human IgG1 AF647
c ED38 (polyreactive d 1.0

4 positive control) S078 S120
1.5
S078.U 0.8 S120.U
3 S210
MG053 MG053
1.0 0.6
A450
A450
A450
2
S078 0.4
S212 0.5
1 0.2
S120
] S116/S118/S080/
0 0.0 0.0
100 101 102 10 3 MG053 (negative control) 100 101 102 103
100 101 102 103
Monoclonal antibody (nM) Monoclonal antibody (nM) Monoclonal antibody (nM)
Fig. 2 | Selection of commensal-binding clones in steady-state gaGCs. Scale bars, 50 μm. Additional trees are in Extended Data Fig. 2b, c. b, Binding of
a, Relationships among Igh sequences from B cells of high-NDS germinal monoclonal antibodies to faecal bacteria from specific-pathogen-free (SPF)
centres, sorted as in Extended Data Fig. 2a. Left, images and pie charts show mice. Gated on SYTO BC+, DAPI– live bacteria (Extended Data Fig. 2d;
clones (inner ring, with grey representing the major clone) and Brainbow see Methods). Clones MG053 and G200 (see below) are non-bacteria-reactive
colour distributions (outer ring) for each germinal centre. Numbers within negative controls. The graphs at the right summarize seven independent
images are NDS values; numbers in pie charts are numbers of cells in the major experiments (M, mutated; U, unmutated); background (percentage positive in
clone by the total number of cells sequenced. Right, phylogenies for major secondary only) is subtracted from all data points. P values obtained from
clones (grey in the pie charts). Arrows indicate ‘clonal burst’ points; names are two-tailed Wilcoxon paired samples test. c, Binding of monoclonal antibodies
indicated whenever a recombinant monoclonal antibody was generated from a to faecal bacteria, assessed by ELISA. Lines show means of three assays. A450,
sequence (see Supplementary Table 1). CFP, cyan fluorescent protein; RFP, red absorbance at 450 nm. d, As in c, but showing monoclonal antibodies S078 and
fluorescent protein; YFP, yellow fluorescent protein; UA, unmutated ancestor. S120 as well as their unmutated ancestors.
Despite this rapid turnover, the normalized dominance score (NDS, centres with NDS values of more than 0.5 could occasionally be detected
an estimate of the frequency of B cells in a germinal centre that carry as early as day 14 after tamoxifen, peaking at day 23 after tamoxifen,
the dominant colour11,15; see Supplementary Information) in gaGCs when 15% of Peyer’s patches (3 of 20) had reached NDS values of more
increased progressively to day 23 after tamoxifen, when 11% of germinal than 50% (Extended Data Fig. 1e). We conclude that clonal selection is
centres (6 of 57) scored 0.75 or higher (Fig. 1f, g). The strongest clonal detectable in gaGCs, despite chronic exposure to a high burden and
expansions that occur in mLN germinal centres are therefore large and diversity of foreign antigens and the rapid turnover of B cell clones.
rapid enough to generate dominant lineages, despite the replacement To understand the relationship between clonal selection and affinity
of labelled clones with incoming unlabelled B cells. In germinal centres maturation in gaGCs, we used vibratome slicing of agarose-embedded
within Peyer’s patches, clonal selection progressed at a slower rate AID-Confetti lymph nodes11 (Extended Data Fig. 2a) to isolate gaGCs
(Extended Data Fig. 1d, e), as expected from their much larger size and containing ‘winner’ clones, where antigen-driven selection is most
similar rate of turnover (Extended Data Fig. 1f, g). However, germinal likely to have occurred11. Sequencing of Igh from B cells sorted from

Article
a mLN, day 14 post-tamoxifen Day 20 post-tamoxifen b Coloured cell density Colour dominance Dominance × density
c
P < 0.0001
0.90 0.87 0.89 100 100
1.0 1.0
80
NDS > 0.75 (%)

0.8 0.8 75
0.6 60 0.6
mLN
50
0.4 40 0.4
Fluorescent cells per 100 μm2

25
Fluorescent B cells (%)

0.2 20 0.2
Peyer’s patch, day 20 post-tamoxifen 0.0 0 0.0 0
SPF GF
14 7
8
14 7
8
14 7
8
3
NDS
i (0.91)
5–
–1
–2
5–
–1
–2
5–
–1
–2
20
20
20
P = 0.0084
1.0 100 1.0 100
i
0.8 80 0.8 75
NDS > 0.5 (%)

Peyer’s patches
0.6 60 0.6
ii 50
0.4 40 0.4
ii (0.98)
20 25
0.2 0.2
0.0 0 0.0 0
SPF GF
3
–1
–2
–1
–2
–1
–2
14
20
14
20
14
20
Days 20–23
Days post-tamoxifen
d UA e UA
M218.U
(0.98)
G226 16 M218
5
(0.76) 4
6 6
PP mLN
78/81 RFP Inferred RFP Inferred
precursor 31/39 precursor
n n cells with 1 mutation n n cells with 1 mutation
identical identical
sequence sequence
f g h j
M218 M216 M220 M222 M224 5 2.0
Secondary B.o. C.i. 100 ED38 M218
80 M218 4 1.5 M218.U
MG053 (negative control) B.c. L.r.
M218.U 3 MG053
A450
E.f. 60 M220
A450
A.m. MG053 1.0
ED38 (positive control) M.i. 40 2 M218
F.p. 20 1 M216 0.5
M216 C.c. MG053
Ak.m. 0 0 0.0
0 103 104 105 100 101 102 103 100 101 102 103
M218 Bl.c.
Anti-hIgG1 AF647
M220 i M216
100 M216 100 M218 100 M220 1.2 3 M220
M222 M216.U
80 M216.U 80 M218.U 80 M220.U 1.0 M220.U
M224 MG053 MG053 0.8 MG053 2 MG053
60 60 60 MG053
A450
A450
0.6
M228 40 40 40 0.4 1
20 20 20 0.2
M232
0 0 0 0.0 0
0 103 104 105 0 103 104 105 0 103 104 105 0 103 104 105 100 101 102 103 100 101 102 103
Anti-human IgG1 AF647 Anti-human IgG1 AF647 Monoclonal antibody (nM)
Fig. 3 | Accelerated selection in gaGCs of germ-free and and Oligo-MM12 (e) germinal centres with high NDS values. Details as in Fig. 2a.
Oligo-MM12-colonized mice. a, Representative multiphoton images of Additional trees are in Extended Data Fig. 6a, b. Scale bars, 50 μm. f, Flow
germ-free mLNs and Peyer’s patches at different times after treatment with cytometry showing the binding of monoclonal antibodies to faecal bacteria
tamoxifen. Blue represents collagen (second harmonics); white shows from Oligo-MM12-colonized mice, detected using anti-human IgG1 Alexa Fluor
autofluorescence; other colours are from the Confetti allele. Scale bars 647. ED38, polyreactive positive control monoclonal antibody; MG053,
represent 200 μm (Peyer’s patch overview) and 50 μm in close-ups. Numbers in negative control. The dotted line placed at 102 is for reference purposes. g, Dot
yellow are NDS values. b, Quantification of multiple images as in a for mLNs and blot showing the binding of Oligo-MM12 monoclonal antibodies to a subset of
Peyer’s patches. Left, density of coloured cells (fluorescent cells in the cultured Oligo-MM12 strains (black font; see Supplementary Table 4).
germinal-centre dark zone). Centre, colour dominance (frequency of the Oligo-MM12 strains tested are Acutalibacter muris (A.m.), Clostriudium
most-common colour). Right, NDS (frequency of B cells in a germinal centre innocuum (C.i.), Enterococcus faecalis (E.f.), Muribaculum intestinale (M.i.),
that carry the dominant colour). Each symbol represents one germinal centre; Flavonifractor plautii (F.p.), Clostridium clostridioforme (C.c.), Akkermansia
medians indicated. Only germinal centres with a density of more than 0.4 muciniphila (Ak. M.) and Blautia coccoides (B.c.). Bacteroides ovatus (B.o.) and
fluorescent cells per 100 μm2 are included in the NDS calculations. c, Proportion Bacteroides caccae (B.c.) are negative controls (blue font). Arrows indicate E.f.,
of germinal centres with NDS values of more than 0.75 in mLNs (top) and more which is bound only by M218. h, i, Binding of monoclonal antibody M218 to
than 0.5 in Peyer’s patches (bottom) under SPF and germ-free conditions at cultured E. faecalis (h) and of three Oligo-MM12 antibodies to faecal bacteria
20–23 days after tamoxifen. For SPF and germ-free mLN gaGCs, n = 57 and 27, from Oligo-MM12-colonized mice (i), as measured by flow cytometry. Gated on
respectively. For Peyer’s patch gaGCs, n = 20 and 9, respectively. SPF data are SYTO BC+, DAPI– live bacteria (Extended Data Fig. 2d). Data in f–i are
from Fig. 1g and Extended Data Fig. 1e. P values are from two-tailed Fisher’s representative of experiments carried out on at least two separate occasions.
exact tests. Error bars represent exact binomial 95% confidence intervals. Data j, Binding of monoclonal antibodies to faecal bacteria, measured by ELISA.
for b, c are from 3–5 mice per time point, except for days 5–7 which were from 1 Lines show means of two assays.
mouse. d, e, Relationship among Igh sequences from B cells of germ-free (d)
such germinal centres showed evidence of ‘clonal bursts’—jackpot-type to bacterial flow cytometry, two of seven antibodies produced from
positive selection events in which multiple B cells descending from a burst-associated immunoglobulin sequences reproducibly bound
single somatic hypermutation (SHM) variant account for a large frac- faecal bacteria (Fig. 2b and Extended Data Fig. 2d, e). Binding followed
tion of cells in a germinal centre11 (Fig. 2a and Extended Data Fig. 2a–c). different patterns: whereas monoclonal antibody S078 bound strongly
Because clonal bursts are regularly associated with the acquisition of to a small population of bacteria, S120 bound with moderate intensity
affinity-enhancing mutations11, we produced recombinant monoclonal to a much larger cohort (Fig. 2b). These two antibodies—as well as two
antibodies16 using burst-point sequences to probe for binding to com- other clones (S210 and S212)—reacted with bacteria-rich centrifuga-
mensal bacteria (Supplementary Table 1). Despite the variation inherent tion fractions, as measured by enzyme-linked immunosorbent assay

a VH1–47/JH4 VH–12/JH3 they did under SPF conditions (Fig. 3a–c). By day 23 after tamoxifen,
4 4
3 3 roughly 56% of mLN germinal centres had reached an NDS of 0.75
Bits
Bits
2 2
1
0
1
0
or higher, compared with around 11% in SPF mice (Fig. 3c). This was
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
Position Position confirmed at the Igh sequence level: 11 of 14 mLN germinal centres
b GF1 GF2 GF3 GF4 GF5 GF6 GF7 MM12 1 MM12 2 MM12 3 picked at random from vibratome slices had dominant clones that
C I CC P P P J J J P P I DD J J I I C D J I C P I C P CCC C P
90 accounted for more than 50% of all B cells in the germinal centre, com-
80
pared with 1 of 20 germinal centres in SPF conditions (Fig. 1a–c and
Cells sequenced
70
60
50 Extended Data Fig. 5a–d). Faster selection of Brainbow colours was also
40
30
observed in germ-free Peyer’s patches, where 6 of 9 germinal centres
20 exceeded an NDS of 0.5 by day 20–23 after tamoxifen, compared with
10
0 3 of 20 under SPF conditions (Fig. 3a–c). Germinal-centre selection in
Individual germinal centres
(AID-Confetti)
mLN slices
(wild type)
Whole organs
(wild type)
Individual germinal centres
(AID-Confetti)
Oligo-MM12-colonized mice fell between the SPF and germ-free rates
VH1–47/ VH1-47/ Other VH1-47 VH1-12/ Other VH1–12 All other V (Extended Data Fig. 5e–g). Therefore, selection in gaGCs is not depend-
ARGSNY.../JH4 ARGSNY.../JH2 AREGFAY/JH3 segments
ent on a fully diverse microbiota, and in fact becomes accelerated in
c VH1–47/ARGSNY VH1–12/AREGF(A/D)Y d All public clonotypes Larger expansions the absence of commensal bacteria.
(exact match) (30 total wells or more)
SPF 6 1 Germ-free 6 1 SPF 6 1 Germ-free 6 1
Clonal phylogenies of single-coloured germinal centres from
5 2 5 2 5 2 5 2
germ-free and Oligo-MM12-colonized mice revealed strong clonal
4
4
bursting, as shown by the presence of large expansions of B cells with
3
3
3
3
4
4
identical variable heavy chain (VH) immunoglobulin sequences and
2
2
5
5
1
1
6
6
multiple inferred descendants (Fig. 3d, e and Extended Data Fig. 6a, b,
7
7
1
1
6 6 6 6
arrows). We produced monoclonal antibodies from 16 immunoglobulin

2 2 2 2
5 3 5 3 5 3 5 3
4 4 4 4
Oligo-MM12 1,000 clone*wells Oligo-MM12 1,000 clone*wells sequences that were strongly selected under each condition (includ-
Fig. 4 | Prominent public clonotypes in gaGCs of germ-free and ing those indicated by named arrows in Fig. 3d, e and Extended Data
Oligo-MM12-colonized mice. a, CDR H3 amino-acid sequence logos for B cells Fig. 6a, b; Supplementary Table 1). Of seven monoclonal antibodies
bearing public VH1–47/JH4 or JH2 rearrangements of 11 or 12 amino acids (left) or cloned from Oligo-MM12, three (M216, M218, and M220) bound faecal
VH1–12/JH3 rearrangements of 7 amino acids (right), sequenced from germ-free bacteria fractions from Oligo-MM12-colonized mice (Fig. 3f, i, j), and
gaGCs. b, Frequency across mice or samples of B cells that fit public-clonotype one (M218) bound to cultured Enterococcus faecalis, as assessed by
criteria or carry other VH1–47 or VH1–12 rearrangements. Each bar represents both flow cytometry and dot blotting (Fig. 3g, h). Reversion of somatic
one sample. Data are for 7 germ-free mice (GF1–7) and 3 Oligo-MM12-colonized mutations resulted in larger decreases in binding for M216 and M220
mice (MM12 1–3). D, duodenal mLN; I, ileal mLN; C, caecal-colonic mLN; P, Peyer’s and a modest decrease for M218, as shown by flow cytometry and ELISA
patch. c, Circos plots showing the distribution of VH1–47 and VH1–12 public (Fig. 3h–j). Thus, as with SPF microbiota, vertical colonization with
clonotypes in a second cohort of sex- and age-matched SPF, germ-free and
Oligo-MM12 triggers efficient antigen-driven maturation towards com-
Oligo-MM12-colonized mice. Each segment represents pooled germinal-centre
mensals in steady-state gaGCs.
B cells from the mLNs and Peyer’s patches of one mouse, with clones ordered
We subjected the nine monoclonal antibodies obtained from
clockwise from largest to smallest. Only clones containing identical CDR H3s are
linked (see Methods for a full description). Samples were gated as in Extended
germ-free winner clones to an array of assays that covered major poten-
Data Fig. 9b; all samples were sequenced in a single experiment. d, Circos plots tial sources of antigen, including food, autoantigens (anti-nuclear
as in c, showing all public IgH clonotypes shared across mice housed under the antibody and intestinal tissue antigens), faecal bacteria, and a standard
indicated conditions. Lines connect clonotypes with the same VH/CDR H3/JH polyreactivity panel. None of the germ-free antibodies reacted above
amino-acid sequence. Left, all clones; right, only those clones spanning 30 or background levels in any of these assays (Extended Data Fig. 6c–f). To
more wells in total. In c, d, only clones that were found in at least two wells from determine whether germ-free germinal centres were indeed populated
the same mouse are linked. in a BCR-dependent manner, or simply stochastically owing to a lack
of antigenic stimulation, we searched for commonalities in the Igh
sequences of winner clones, along with additional sequences obtained
(ELISA; Fig. 2c). Only S210 and S212 showed a mild degree of polyreac- from single germinal-centre B cells from mLN vibratome slices and
tivity using standard measures4,17 (Extended Data Fig. 2f). Reversion of whole mLNs and Peyer’s patches of wild-type germ-free mice. This
somatic mutations in S078 and S120 resulted in decreased binding to revealed substantial overlap of clonotype ‘themes’ across individu-
bacteria by both flow cytometry and ELISA (Fig. 2b, d). Thus, commensal als, which we regarded as unlikely to be random given the small pool
binding with the characteristic features of antigen-driven maturation is of germinal centres sampled. Two themes were particularly preva-
detectable in steady-state gaGCs when analysis is focused on strongly lent (Fig. 4a, b). One used the relatively rare VH1–47 segment, coupled
selected gaGC winner clones. to joining segments JH4 or JH2 via an 11–12-amino-acid heavy-chain
To investigate the influence of commensal diversity on gaGC selection complementarity-determining-region 3 (CDRH3) sequence that begins
dynamics, we rederived AID-Confetti mice into germ-free conditions, with the consensus sequence ARGSNY (Fig. 4a). No commonalities in
in which germinal centres still form18, as well as into stable vertical colo- light-chain usage were detected at this sampling depth. After allow-
nization with a consortium of 12 bacterial strains representing major ing a one-amino-acid substitution in the ARGSNY motif, this theme
phyla present in the mouse gut19 (Oligo-MM12; Extended Data Fig. 3). was present in 5 of 7 germ-free mice, accounting for 16.6% of all cells
When compared with SPF mice, germ-free mice had higher frequen- sequenced (Fig. 4b and Extended Data Fig. 7a). Clones with these char-
cies of germinal-centre B cells in jejunal and ileal mLNs20, and lower acteristics represented only 0.00006% of all reads (1 in roughly 17,000)
frequencies in duodenal and caecal-colonic mLNs and Peyer’s patches in a previously published database of naive B cell Igh sequences from
(Extended Data Fig. 4a, b). Germ-free germinal centres were strongly C57BL6 mice, containing more than 30 million reads representing
skewed away from IgG2b and IgA towards IgG1 (Extended Data Fig. 4c). 2.5 million unique rearrangements from 5 mice21 (P < 2.2 × 10−16 com-
Colonization with Oligo-MM12 microbiota partly restored the phenotype pared with germ-free gaGCs).
of SPF mice, increasing germinal-centre B cell frequency in distal Peyer’s A second public clonotype was encoded by the rare VH1–12 segment,
patches and IgG2b in proximal Peyer’s patches (Extended Data Fig. 4d). with the stricter seven-amino-acid consensus CDRH3 sequence, AREG-
Multicolour fate-mapping showed that strongly selected germinal FAY, followed by JH3 (Fig. 4a). Again, no patterns of light-chain usage
centres accumulated at a markedly faster rate in germ-free mice than were identified. Allowing a one-amino-acid substitution in CDRH3, this

Article
clonotype was present in 2 of 7 germ-free mice, representing roughly depending on the presence and complexity of the gut microbiota.
3% of all cells sequenced. This clonotype was also heavily dominant in Under microbial-replete conditions, selection of highly dominant
all 3 single-coloured germinal centres sorted from different organs clones is relatively rare and is associated with improved binding of
of one of three Oligo-MM12-colonized mice, accounting for 171 of 191 commensal-derived antigens. At the other extreme, gaGC selection
cells sequenced from this mouse (Fig. 4b and Extended Data Fig. 8a). accelerates precipitously in the absence of microbes, leading to strong
Despite its short length, the VH1–12/AREGFAY/JH3 combination was convergent selection of IgH clonotypes across mice. Thus, clonal selec-
seen only 7 times (in 1 of 5 mice) in the more than 30 million reads of tion in steady-state gaGCs is a tunable process (see Supplementary
the naive B cell database21, and only twice more in one other mouse if Information for further discussion). The ability to generate specific,
a single amino-acid substitution in the AREGFAY motif was allowed affinity-matured responses to commensals would allow targeted con-
(P < 2.2 × 10−16 compared with germ-free gaGCs). Both clonotypes accu- trol of individual bacterial species and may thus play a part in maintain-
mulated somatic mutations that converged across mice and between ing the composition of the gut microbiota.
SPF and Oligo-MM12 conditions (Extended Data Fig. 7b). In agreement
with their failure to bind food protein extracts (VH1–47 and VH1–12
clonotypes are represented by monoclonal antibodies G082/G226 and Online content
M228/M232, respectively) (Fig. 3d–j and Extended Data Fig. 6), both Any methods, additional references, Nature Research reporting sum-
clonotypes were also detected in mice fed a custom-made protein-free maries, source data, extended data, supplementary information,
chow formulated from purified ingredients (Extended Data Fig. 8b, c acknowledgements, peer review information; details of author con-
and Supplementary Table 2). tributions and competing interests; and statements of data and code
To assess whether reliance on public clonotypes is broadly charac- availability are available at https://doi.org/10.1038/s41586-020-2865-9.
teristic of germ-free gaGCs when sampled in an unbiased manner, we
developed a multiwell incidence-based approach to measure clonal
1. Bergqvist, P. et al. Re-utilization of germinal centers in multiple Peyer’s patches results in
overlaps between mice with high confidence. In total, we sequenced highly synchronized, oligoclonal, and affinity-matured gut IgA responses. Mucosal
roughly 80 thousand cells from gaGCs in the mLNs and Peyer’s patches Immunol. 6, 122–135 (2013).
2. Casola, S. et al. B cell receptor signal strength determines B cell fate. Nat. Immunol. 5,
of 6 germ-free, 6 SPF and 7 Oligo-MM12-colonized mice (Extended
317–327 (2004).
Data Fig. 9a–d). Confirming our initial findings, the VH1–47/ARGSNY 3. Yeap, L. S. et al. Sequence-intrinsic mechanisms that target AID mutational outcomes on
theme (regardless of JH usage) was present in 6 of 6 germ-free and 5 of antibody genes. Cell 163, 1124–1137 (2015).
4. Bunker, J. J. et al. Natural polyreactive IgA antibodies coat the intestinal microbiota.
7 Oligo-MM12-colonized mice, corresponding to 4.4% (173 of 3,929)
Science 358, eaan6619 (2017).
and 2.4% (98 of 4,043) of clone*wells (a number obtained by multiply- 5. Reboldi, A. & Cyster, J. G. Peyer’s patches: organizing B cell responses at the intestinal
ing each clone by the number of wells it was found in) (Extended Data frontier. Immunol. Rev. 271, 230–245 (2016).
6. Bemark, M. et al. Somatic hypermutation in the absence of DNA-dependent protein
Fig. 9a–c) in each condition (Fig. 4c). One such clone was also detected kinase catalytic subunit (DNA-PK(cs)) or recombination-activating gene (RAG)1 activity.
in a single SPF mouse (5 of 3,629 clone*wells) (Fig. 4c), and 3 others (one J. Exp. Med. 192, 1509–1514 (2000).
using the JH1 segment) were observed in our photoactivation data (see 7. Biram, A. et al. B cell diversification is uncoupled from SAP-mediated selection forces in
chronic germinal cnters within Peyer’s patches. Cell Rep. 30, 1910–1922 (2020).
Fig. 1 and Supplementary Spreadsheet 1), indicating that full bacterial 8. Bunker, J. J. & Bendelac, A. IgA responses to microbiota. Immunity 49, 211–224 (2018).
colonization is not sufficient to completely exclude such clonotypes 9. Casola, S. & Rajewsky, K. B cell recruitment and selection in mouse GALT germinal
from gaGCs. Clonotype VH1–12/AREGFAY was present, again at a lower centers. Curr. Top. Microbiol. Immunol. 308, 155–171 (2006).
10. Biram, A. et al. BCR affinity differentially regulates colonization of the subepithelial dome
frequency, in 5 of 6 germ-free mice (44 of 3,929 clone*wells) (Fig. 4c). and infiltration into germinal centers within Peyer’s patches. Nat. Immunol. 20, 482–492
Analysis of Levenshtein distances showed that, in germ-free and (2019).
Oligo-MM12-colonized but not SPF mice, CDRH3 sequences of VH1–47 11. Tas, J. M. et al. Visualizing antibody affinity maturation in germinal centers. Science 351,
1048–1054 (2016).
and VH1–12 clones were closer in sequence to the ARGSNY and AREGFAY 12. Victora, G. D. et al. Germinal center dynamics revealed by multiphoton microscopy with a
motifs, respectively, than were CDRH3 sequences of clones using other photoactivatable fluorescent reporter. Cell 143, 592–605 (2010).
V segments (Extended Data Fig. 9e). Of note, both VH1–47/ARGSNY/ 13. Livet, J. et al. Transgenic strategies for combinatorial expression of fluorescent proteins in
the nervous system. Nature 450, 56–62 (2007).
JH4 and VH1–12/AREGFAY/JH3 rearrangements were found recurrently 14. Shinnakasu, R. et al. Regulated selection of germinal-center cells into the memory B cell
in Peyer’s patches of germ-free mice in independent work22 published compartment. Nat. Immunol. 17, 861–869 (2016).
while this paper was under review, further underscoring the public 15. Meyer-Hermann, M., Binder, S. C., Mesin, L. & Victora, G. D. Computer simulation of
multi-color Brainbow staining and clonal evolution of B cells in germinal centers. Front.
nature of these two clonotypes. Finally, public clonotypes in general Immunol. 9, 2020 (2018).
(defined as any recurrent VH/JH combination with exactly matching 16. Tiller, T., Busse, C. E. & Wardemann, H. Cloning and expression of murine Ig genes from
CDRH3 amino-acid sequences) were markedly more frequent across single B cells. J. Immunol. Methods 350, 183–193 (2009).
17. Meffre, E. et al. Surrogate light chain expressing human peripheral B cells produce
germ-free and Oligo-MM12 gaGCs than across SPF mice (Fig. 4d and self-reactive antibodies. J. Exp. Med. 199, 145–150 (2004).
Extended Data Fig. 9f). Although almost all germ-free-associated clo- 18. Pollard, M. in Germinal Centers in Immune Responses (eds Odartchenko, N. et al.)
notypes were either eliminated or reduced to below detection levels 343–348 (Springer, 1967).
19. Brugiroux, S. et al. Genome-guided design of a defined mouse microbiota that confers
by SPF colonization, the Oligo-MM12 gaGC repertoire overlapped more colonization resistance against Salmonella enterica serovar Typhimurium. Nat. Microbiol.
substantially with that of germ-free mice, indicating that Oligo-MM12 2, 16215 (2016).
colonization is insufficient to completely replace the germ-free 20. Esterházy, D. et al. Compartmentalized gut lymph node drainage dictates adaptive
immune responses. Nature 569, 126–130 (2019).
response (Fig. 4d and Extended Data Fig. 9g). Thus, rather than being 21. Greiff, V. et al. Systems analysis reveals high genetic and antigen-driven
stochastically populated, gaGCs display stringent selection driven by predetermination of antibody repertoires throughout B cell development. Cell Rep.
BCR specificity under conditions of low antigenic diversity, resulting 19, 1467–1478 (2017).
22. Chen, H. et al. BCR selection and affinity maturation in Peyer’s patch germinal centres.
in rapid focusing of germinal centres on dominant lineages and pro- Nature 582, 421–425 (2020).
nounced reliance on clonotypes found repeatedly across different mice.
We have shown that gut-associated germinal centres undergo clonal
selection and antigen-driven maturation in the absence of infection
or immunization, and that the rate of selection varies markedly © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods mounted in phosphate-buffered saline (PBS) between two coverslips
held together with vacuum grease, as previously described25. Mounted
Mice and treatments mLNs and Peyer’s patches were imaged on an Olympus FV1000 upright
Mice were housed at a temperature of 72 °F and humidity of 30–70% in a microscope with a ×25 Plan water-immersion objective (numerical
12-h light/dark cycle with ad libitum access to food and water. Male and aperture (NA) 1.05) and a Mai-Tai DeepSee titanium-sapphire laser
female mice aged 8–12 weeks at the start of the experiment were used. (Spectraphysics). Confetti alleles were imaged at an excitation wave-
PA-GFP transgenic mice were generated in our laboratory12. Rosa26Confetti length of λ = 930 nm. Fluorescence emission was collected in three
(B6.129P2-Gt(ROSA)26Sortm1(CAG-Brainbow2.1)Cle/J) and Rosa26Stop-tdTomato (B6. channels, using a pair of CFP (480/40 nm) and YFP (525/50 nm) filters
Cg-Gt(ROSA)26Sortm14(CAG-tdTomato)Hze/J)23 mice were obtained from Jackson separated by a 505-nm dichroic mirror to detect CFP, GFP and YFP,
Laboratories. Rosa26Confetti mice were backcrossed to C57BL/6 mice for and a dedicated RFP filter (605/70 nm). In situ photoactivation was
several generations in our laboratory and restricted to the Ighb/b haplo- performed as previously described12,26. PA-GFP transgenic mice were
type. AicdaCreERT2 (Aicdatm1.1(cre/ERT2)Crey) mice24 were a gift from C.-A. Rey- injected intravenously with 5 μg of a non-blocking antibody to CD35
naud and J.-C. Weill (Institut Necker, Paris). S1pr2CreERT2 (Tg(S1pr2-cre/ (clone 8C12, produced in house) conjugated to Cy3 to label networks of
ERT2)#Kuro) BAC-transgenic mice14 were a gift from T. Okada (Riken, follicular dendritic cells (FDCs). Clusters of CD35+ cells were identified
Yokohama) and T. Kurosaki (Univ. Osaka). Mice were bred and main- by imaging at λ = 950 nm, where photoactivation does not take place,
tained either in SPF facilities or in germ-free isolators at the Rockefeller and three-dimensional (3D) regions of interest were photoactivated by
University animal facility. The Oligo-MM12 consortium was a gift from K. higher-power scanning at λ = 830 nm. Fragments of lymph nodes were
McCoy (Univ. Calgary). We colonized germ-free AID-Confetti breeders then processed for flow cytometry as described below.
with a single gavage of Oligo-MM12 and monitored colonization (includ-
ing the presence of the entire consortium in successive generations) Image analysis
by specific amplification of individual bacterial members by quantita- Colour dominance in AID-Confetti germinal centres was determined in
tive polymerase chain reaction (qPCR; see below). Mice were bred and 3D data sets reconstructed using ImageJ software and the Bio-formats
maintained in isolators. Vertically colonized AID-Confetti Oligo-MM12 plugin. Cells of each colour or colour combination were counted manu-
mice were used for all experiments. To deplete protein antigen from the ally in two or more 2D slices, at least 20 μm apart, using the Cell Counter
diet, we used a custom solid diet containing free amino acids (Modified plugin. For overviews of Peyer’s patches, colours were counted in the
TestDiet 9GCV with 5% cellulose; composition details are in Supple- same way, but from a single imaging plane. The normalized dominance
mentary Table 1). Diets were irradiated at more than 45 kGy to ensure score (NDS) was calculated as before11. In brief, we first estimated the
sterility for germ-free conditions and were provided to mice from one fraction of recombined cells in each germinal centre by calculating
week of age until the time of analysis. the density of fluorescent cells per area unit (100 μm2) in anatomically
Recombination of floxed alleles in both AID-Confetti and defined dz areas in which cell distribution was homogeneous, then
S1PR2-Tomato mice was induced by two gavages of 10 mg tamoxifen multiplied the density by the fraction of coloured cells accounted for by
(Sigma, catalogue number T5648) dissolved in corn oil at 50 mg ml−1, the dominant colour in the germinal centre. Fully recombined germinal
2 days apart. To ensure that selection was not a function of the route centres have a cell density of very close to 1 (ref. 11), making an NDS of 1 a
of administration of either tamoxifen or corn oil, we also injected two good approximation of 100% occupancy by the dominant colour. The
AID-Confetti mice intra-peritoneally once with 10 mg tamoxifen for sizes of germinal centres in mLNs and Peyer’s patches were calculated
analysis at the SPF 21–23 day time point. In germ-free AID-Confetti as the cross-sectional area of the largest available z-section. All image
mice, tamoxifen was prepared and administered under sterile condi- analysis was carried out in ImageJ.
tions to mice housed in individually ventilated isocages (TecniPlast).
Sample sizes were not calculated a priori. Given the nature of the Lymphocyte flow cytometry and sorting
comparisons (mice born under differing microbial colonization sta- To evaluate the dynamics of germinal centres and the distribution
tus), mice were not randomized into each experimental group and of isotypes across space and time, we isolated individual mLNs that
investigators were not blinded to group allocation. drain the duodenum, jejunum, ileum or caecum/colon as previously
All animal procedures were approved by the Institutional Animal described20. Pairs of Peyer’s patches were isolated from the most proxi-
Care and Use Committee of the Rockefeller University. mal part of the duodenum or the most distal part of the ileum. Cells
were isolated by maceration using disposable micropestles (Axygen)
Oligo-MM12 qPCR in 100 μl of PBS supplemented with 0.5% bovine serum albumin (BSA)
Colonization of mice by the Oligo-MM12 consortium was confirmed and and 1 mM EDTA (constituting PBE buffer), and single-cell suspensions
monitored over generations by qPCR, using primer pairs specific to were obtained by two passes through a 70-μm mesh. Cells were stained
each species as previously validated (individual strain primer sequences with antibodies against B220 protein, T cell antigen receptor (TCR) α/β
are in Supplementary Table 5, adapted from ref. 19; universal bacterial chains (or a ‘dump’ mixture containing antibodies against CD4, CD3,
qPCR primers are as follows: UNIF340-ACTCCTACGGGAGGCAGCAGT CD8 and NK1.1), CD38, Fas, IgM, IgG1, IgG2b and IgA, supplemented with
and UNIR44R-ATTACCGCGGCTGCTGGC). DNA was extracted from fae- Fc block (see Supplementary Table 3) for 30 min on ice. Samples were
cal samples using the ZR Fecal DNA kit (Zymo Research) according to the run on a FACS LSRII or FACS Symphony (BD Biosciences).
manufacturer’s instructions. Quantitative PCR was performed with the To sort B cells from single mLN germinal centres, we first deter-
Power SYBR Green master mix (Applied Biosystems). The average cycle mined the localization of single-coloured germinal centres (see ‘Mul-
threshold (Ct) value of two technical replicates was used to quantify tiphoton imaging and photoactivation’ section above). As previously
the relative abundance of each species’ 16S ribosomal RNA using the described11, we embedded selected mLNs in 4% low-melt NuSieve GTG
∆∆Ct method, with the universal 16S rRNA primers serving as controls agarose in PBS that had been heated to boiling then cooled to 37 °C
between samples. Relative abundance was corrected according to the before embedding. We then cut lymph nodes into 300-μm slices using
genome copy number of 16S rRNA for each species. a Leica VT1000A vibratome. Slices were further dissected under a Leica
M165FC fluorescence stereomicroscope using a double-edged razor
Multiphoton imaging and photoactivation blade to isolate single germinal centres from slices in which several
mLNs and Peyer’s patches were collected and imaged as previously germinal centres were present. Slice fragments were placed in micro-
described11. In brief, adipose tissue and excess epithelium were removed centrifuge tubes containing 100 μl PBE, macerated using disposable
under a dissecting microscope, and mLNs and Peyer’s patches were micropestles and dissociated into single-cell suspensions by gentle
Article
vortexing. We then added 100 μl of 2× antibody stain (comprising sequences bearing VH1–47 and JH2 or JH4 connected by a 11-amino-acid
antibodies against CD38, Fas, B220 and TCR-αβ supplemented with or 12-amino-acid CDRH3, or VH1–12 and JH3 connected by a 7-amino-acid
Fc block; see Supplementary Table 3) to the cell suspension, which was CDRH3. The results of this alignment were then processed using the
incubated on ice for 30 min. Single cells were sorted as described below WebLogo3 web server34. Public clones were searched in the Greiff
using FACS Aria II or III cell sorters. Cells positive for any fluorescent et al.21 database using R. Matching required exact VH, JH and CDRH3
Confetti colour, detected as previously described11, were index-sorted length matches and up to one-amino-acid mismatch in the ARGSNY or
into 96-well plates containing 5 μl TCL buffer (Qiagen) supplemented AREGFAY motifs. Dendrograms were generated using ClustalX (version
with 1% β-mercaptoethanol11,27. The precise assignment of colours from 2.1) and FigTree (version 1.4.4) and branches were coloured in Adobe
index-sorted cells was carried out post-acquisition using Diva software, Illustrator according to the sequence annotations.
version 8.0.2, using all four channels. For incidence-based sequencing, raw paired-end sequences were
For the isolation of single follicles from non-fluorescent germ-free merged as above and submitted to HighV-QUEST (version 1.6.9) for
mice, mice were injected with anti-CD35 (8C12) Alexa Fluor 594, individ- annotation30. The output database was then processed in the R envi-
ually dissected, imaged, and sliced as above. Slices of 250 μm thickness ronment to remove non-functional and out-of-frame sequences. Any
were examined under a stereomicroscope and manually microdissected sequences with less than six reads were discarded. Expanded clonal
into slice fragments comprising roughly one follicle each, as defined by populations were defined using Change-O35. Briefly, sequences within
staining of FDCs (see the section ‘Multiphoton imaging and photoacti- the same mouse that shared VH and JH genes and with a maximum CDRH3
vation’ above). Two slice fragments were prepared per mLN, isolated hamming distance of four nucleotides were grouped into a single
from slices that were at least 1,000 μm (four 250-μm slices) apart in clone*well of size equivalent to the number of wells this sequence was
the intact node. Slice fragments were prepared for single-cell sorting found in and retaining the full list of CDRH3 sequences in the cluster
as above. For the isolation of single germinal centres from fluorescent (generating a CDR3list for that clone*well). To remove the possibility
germ-free mice, we combined Confetti fate-mapping with anti-CD35 of errors due to sequence misassignment and interwell contamina-
follicle staining. Germ-free AID-Confetti mice were treated with tamox- tion, sequences found in a single well within their mouse of origin were
ifen as described above, and, 24 h before imaging, were injected with eliminated. CDRH3 sequences were translated to amino-acid sequences
5 μg anti-CD35-Cy3 antibody. Mesenteric lymph nodes were sliced and non-CDRH3 sequences were discarded before assignment of public
and imaged and fluorescent clusters were manually excised, avoid- clones. Our definition of a public clone required that clones have the
ing any other follicles (marked with anti-CD35 antibody). In this case, same VH and JH segments, and an exact match between any of the CDRH3
both fluorescent and non-fluorescent germinal-centre B cells were sequences in the clone*wells’ CDR3lists. Public clonotypes VH1–47/
single-sorted for sequencing. ARGSNY and VH1–12/AREGFAY were identified in this database using
For single-cell sequencing from whole mLNs and Peyer’s patches, the same criteria as above but without restriction of JH segment usage.
organs from non-fluorescent germ-free mice were isolated and pro- Circular ideogram plots were created using Circos (version 0.69-9), with
cessed into single-cell suspensions by maceration using disposable each individual mouse represented by a single ideogram bar36. The full
micropestles. After straining through a 70-μm mesh, cells were stained analysis pipeline is available at https://github.com/victoraLab/MIBS.
and sorted as above. For incidence-based sequencing experiments, Levenshtein distances were calculated in R using the stringdist package.
samples from mLNs and Peyer’s patches were stained as above, and Wells assigned to the same clone within the same mouse were counted
sorted at 100 cells per well into 16–32 individual wells in a 96-well plate only once to avoid overrepresentation due to clonal expansion. All
containing 10 μl TCL buffer per well (Qiagen) supplemented with 1% sequences with 15 or more reads were used in this analysis, regardless
β-mercaptoethanol. Analysis of data from flow-cytometry experiments of the number of wells in which a clone was present. For the VH1–47/
was carried out using FlowJo software, version 10.5.3 (Tree Star Inc.). ARGSNY clonotype, distances were calculated starting from the string
ARGSNYXXXXDY and plotted as the resulting difference minus 4 to
Immunoglobulin sequencing correct for the 4 ‘X’ characters.
RNA from single cells or 100-cell pools was reverse-transcribed using
oligo-dT primers as in refs. 11,16. PCR primers were used as previously Production of monoclonal antibodies
described11,28 with the addition of IgA-specific amplification primers Heavy- and light-chain sequences obtained from the same expanded
Cα outer (ATCAGGCAGCCGATTATCAC) and Cα inner (GAGCTCGTGGG nodes in somatic hypermutation (SHM) phylogenies of germinal-centre
AGTGTCAGTG)16. Pooled PCR products were then purified using SPRI B cells, as well as their deduced unmutated ancestors, were produced
beads (0.7× volume ratio), gel purified and sequenced with a 500-cycle and assembled into custom mammalian expression vectors (modi-
Reagent Nano kit v2 for single-cell libraries and with a 600-cycle Rea- fied from ref. 16) encoding the human IgG1 and IgK constant regions
gent kit v3 for 100-cell pool libraries on the Illumina Miseq platform. (Twist Biosciences). Plasmids were transfected into Freestyle 293-F
suspension cells (obtained from Life Technologies and tested for myco-
Sequence analysis plasma contamination in our laboratory), and monoclonal antibodies
For single-cell Igh analyses, raw paired-end sequences were merged at were purified using protein-G affinity chromatography, as previously
the overlapping regions using PANDAseq (version 2.11) for full amplicon described11,37. Integrity of all monoclonal antibodies was assayed/quan-
reconstruction29, then processed with the FASTX toolkit. Only those tified by SDS-PAGE and biolayer interferometry on an Octet Red96
sequences with high counts for each single cell/well were analysed. For instrument using protein-G-coated sensors (FortéBio).
annotation of V(D)J gene rearrangement, the Ig sequences obtained
were submitted to both HighV-QUEST (version 1.6.9)30 and Vbase2 Assays for monoclonal antibody binding
(ref. 31) databases, choosing the assignment that yielded the lowest Bacterial flow cytometry was carried out using protocols adapted from
number of somatic mutations in case of discrepancy. Sequences with refs. 4,38. Freshly collected faeces from SPF or Oligo-MM12 mice from our
common VH/JH genes and the same CDRH3 length were grouped into own facility were macerated with micropestles in 100 μl of ice-cold PBS
clonal lineages when CDRH3 nucleotide identity was 75% or more. Igk per 10 mg of faeces and vortexed for 5 min. Large debris was removed
sequences were determined using the same method when needed for by spinning at 400g for 5 min at 4 °C. Supernatant containing bacteria
the cloning of monoclonal antibodies. Clonal lineage trees were plot- was removed and pelleted by centrifugation at 8,000g before staining
ted using GCtree32, with the unmutated V gene sequence of the V(D)J with SYTO BC bacterial DNA stain (Thermo Fisher; 1:5,000). For staining
rearrangement used as an outgroup. Logo plots for public clonotypes of cultured bacteria, overnight cultures were pelleted at 8,000g and
were created by first using the T-Coffee algorithm33 to align all CDRH3 resuspended in SYTO BC at approximately 109 colony-forming units
(CFU) per millilitre (a density similar to that of faecal bacteria). Stained human IgG1s) in PBST with 1% BSA for 2 h. Blots were washed for 5 min
bacteria were incubated with monoclonal antibodies at 10 μg ml−1 for in PBST three times, stained with 1:1,000 peroxidase-conjugated goat
faeces, or 1 μg ml−1 for cultures in PBS supplemented with 0.25% BSA, anti-human IgG ( Jackson ImmunoResearch, catalogue number 109-035-
for 1 h on ice before washing and incubating with Alexa Fluor 647 con- 098) in PBST with 1% BSA for 1 h, then washed for 5 min in PBST three
jugated goat anti-human IgG1 secondary antibody (Thermo Fisher, times. Blots were dabbed dry and then incubated for 5 min with the
catalogue number A-21445; 1:2,000) with 5% v/v normal goat serum Clarity enhanced chemiluminescence (ECL) substrate (Bio-Rad); chemi-
in PBS plus 0.25% BSA for 30 min. Bacteria were washed in PBS plus luminescence images were acquired with an ImageQuant LAS4000
0.25% BSA for 15 min, and 0.25 μg ml−1 4′,6-diamidino-2-phenylindole (GE) using the same exposure time for all blots.
(DAPI) was added immediately before sample acquisition. Samples Self-reactivity was assessed in western blots that probed whole-
were run on a FACS Symphony (BD Biosciences), with forward scatter tissue lysates. The small intestine was dissected from wild-type SPF
and side scatter set to logarithmic mode. Within each experiment, mice and a 1-cm segment of the distal ileum was removed, cut open
the same number of events was acquired for all samples. Binding was longitudinally, and washed thoroughly with PBS to remove lumenal
evaluated in SYTO BC+ DAPI– live bacteria. Data were analysed using content. The tissue was snap-frozen and then bead-beat directly in Lae-
FlowJo software version 8.7 or 10.5.3 (Tree Star Inc.). mmli buffer to pulverize the tissue, lyse the cells, and reduce/denature
Faecal bacterial ELISAs were performed with bacteria isolated from proteins in one step. Samples were boiled for 10 min and then centri-
freshly collected faeces of either Oligo-MM12 or SPF mice. To regulate fuged at 16,00g for 5 min. Supernatants were run on 4–20% Tris-glycine
the bacterial diversity sampled in SPF mice, faeces from multiple cages gels with SDS, and transferred onto polyvinyldifluoride (PVDF)
of C57BL/6 mice were pooled and prepared as a single sample. Fae- membranes. Blots were blocked and stained using the procedure
ces were macerated with disposable micropestles and centrifuged as described above for dot blots.
described in the paragraph above for bacterial flow cytometry. Bacteria
were fixed in 0.5% paraformaldehyde for 20 min with continual rotation Bacterial culture
to avoid clumping, then washed three times in PBS. Poly-l-lysine-coated A list of bacterial strains is provided in Supplementary Table 4. Bacteria
high-binding ELISA plates were incubated overnight with bacterial prep- were grown in an anaerobic atmosphere of 10% carbon dioxide, 5%
arations at an optical density at 600 nm (OD600) of 0.2 for Oligo-MM12 hydrogen, and 85% nitrogen. Members of the Oligo-MM12 consortium
or an OD600 of 0.35 for SPF. Plates were blocked with PBS plus 1% BSA were cultured in BHI (Becton Dickinson) supplemented with 5 μg ml−1
for 2 h, washed with PBS plus 0.05% Tween, and incubated for 1 h with hemin (Sigma), 5 μg ml−1 vitamin K1 (Sigma), 250 mg l−1 cysteine,
serial dilutions of monoclonal antibodies, starting at a concentration of 250 mg l−1 sodium sulfide, and 4 g l−1 porcine mucin type 3 (Sigma)
900 nM. Plates were washed and incubated with a horseradish peroxi- (the latter only for Akkermansia muciniphila YL44), with the excep-
dase (HRP)-conjugated goat anti-human IgG (Fcɣ-specific) secondary tion of Lactobacillus reuteri I49, which was grown in MRS agar (Becton
antibody ( Jackson Immuno Research, catalogue number 109-035-098) Dickinson).
at 1 μg ml−1 for 1 h. Assays were developed using a chromogenic substrate
(Sigma) according to the manufacturer’s instructions. Absorbance Statistical analysis
was read at 450 nm using an AccuScan plate reader (Fisher Scientific). No statistical methods were used to predetermine sample size. Unless
Polyreactivity ELISAs were carried out as previously described39, otherwise noted, statistical calculations were performed using the tests
with the following modifications. Briefly, high-binding ELISA plates described in the figure legends in GraphPad Prism version 8.3.0. The
(Costar) were coated with single-stranded DNA, double-stranded DNA, Chao1 formula40 was used to estimate total clonal richness in photo-
lipopolysaccharide (LPS) and keyhole limpet haemocyanin (KLH) at activated germinal centres, calculated using the EstimateS package41.
10 μg ml−1 and insulin at 5 μg ml−1 in PBS overnight at 4 °C. Plates were Proportions of clones bearing public-clonotype motifs in our sample
washed with PBS plus 0.05% Tween (PBST) and incubated in PBST versus the Greiff et al.21 database were compared using Fisher’s exact
for 1 h at room temperature. Plates were incubated with serial dilu- test calculated in R. For the ARGSNYXXXXDY CDRH3, Levenshtein dis-
tions of monoclonal antibodies in PBST for 2 h at room temperature, tances were compared between conditions; for the AREFGAY CDRH3,
starting at a concentration of 10 μg ml−1. Plates were incubated with distances were compared between VH1–12 and all clones using a Mann–
HRP-conjugated secondary antibodies, developed and absorbances Whitney test. Exact binomial confidence intervals were calculated
read as above. Self-reactivity was tested using a QUANTA Lite ANA ELISA using the JavaStat online tool at https://statpages.info/confint.html.
kit (Inova Diagnostics) according to the manufacturer’s instructions.
For testing of reactivity to food proteins, crude protein extracts were Reporting summary
prepared by grinding 10 g of germ-free autoclaved chow to a powder in Further information on research design is available in the Nature
a pestle and mortar, then shaking at 120 r.p.m. overnight in 40 ml PBS Research Reporting Summary linked to this paper.
at 37 °C. Extracts were clarified by spinning at 4,000g for 10 min and
then 21,130g for 1 h at 4 °C before filtering through a 0.22-μm filter.
Final protein concentrations were determined using a bicinchoninic Data availability
acid (BCA) assay (Pierce). ELISA plates were coated overnight with Incidence-based sequencing raw and processed data are available
100 μg ml−1 of food extract, blocked with PBS plus 2% BSA, and incubated through BioProject (https://www.ncbi.nlm.nih.gov/bioproject/; identi-
with serial dilutions of monoclonal antibody in PBS plus 2% BSA start- fication code PRJNA647715); the analysis pipeline is available at https://
ing at 10 μg ml−1 for 2 h. Assays were incubated with HRP-conjugated github.com/victoraLab/MIBS.
secondary antibodies, developed, and absorbances read as above.
For dot blots of bacterial lysates, overnight cultures of bacteria were 23. Madisen, L. et al. A robust and high-throughput Cre reporting and characterization
system for the whole mouse brain. Nat. Neurosci. 13, 133–140 (2010).
centrifuged at 8,000g for 8 min to pellet bacteria and washed once with 24. Dogan, I. et al. Multiple layers of B cell memory with different effector functions. Nat.
brain–heart infusion (BHI) medium, then resuspended at a 10× concen- Immunol. 10, 1292–1299 (2009).
tration in BHI. These concentrates were normalized to an OD600 of 4.0, 25. Liu, K. et al. In vivo analysis of dendritic cell development and homeostasis. Science 324,
392–397 (2009).
then mixed 1:1 with Laemmli sample buffer and boiled for 10 min to lyse 26. Shulman, Z. et al. T follicular helper cell dynamics in germinal centers. Science 341,
cells. Next, 2 μl of prepared bacterial lysates were spotted on nitrocel- 673–677 (2013).
lulose and dried at room temperature overnight. Blots were blocked 27. Trombetta, J. J. et al. Preparation of single-cell RNA-seq libraries for next generation
sequencing. Curr. Protoc. Mol. Biol. 107, 4.22.1-17 (2014).
in PBST (1× PBS and 0.1% Tween) with 1% BSA for 2 h, washed briefly 28. Mesin, L. et al. Restricted clonality and limited germinal center reentry characterize
with PBST, then stained with 4 μg ml−1 primary antibody (monoclonal memory B cell reactivation by boosting. Cell 180, 18–20 (2020).
Article
29. Lefranc, M. P. et al. IMGT, the international ImMunoGeneTics information system. Nucleic Rockefeller University for continuous assistance. We thank A. Vale (Universidade Federal do
Acids Res. 37, D1006–D1012 (2009). Rio de Janeiro, Brazil) for help with analysis of public clonotypes; K. McCoy (University of
30. Masella, A. P., Bartram, A. K., Truszkowski, J. M., Brown, D. G. & Neufeld, J. D. PANDAseq: Calgary, Canada), S. Y. Wong and K. Cadwell (New York University, USA) for providing
paired-end assembler for illumina sequences. BMC Bioinformatics 13, 31 (2012). Oligo-MM12 strains; J. Faith (Mount Sinai School of Medicine, USA) for other bacterial strains;
31. Retter, I., Althaus, H. H., Münch, R. & Müller, W. VBASE2, an integrative V gene database. and J Däbritz and E. Wirthgen (University of Rostock, Germany) for contributing to the training
Nucleic Acids Res. 33, D671–D674 (2005). and supervision of C.W. This work was supported by National Institutes of Health (NIH)/
32. DeWitt, W. S., III, Mesin, L., Victora, G. D., Minin, V. N. & Matsen, F. A., IV. Using National Institute of Allergy and Infectious Diseases (NIAID) grants R01AI119006 and
genotype abundance to improve phylogenetic inference. Mol. Biol. Evol. 35, R01AI139117 (to G.D.V.), and NIH/NIAID/National Institute of Diabetes and Digestive and Kidney
1253–1265 (2018). Diseases (NIDDK) grants R01DK093674, R01DK113375 and R21AI144827 (to D.M.), with
33. Notredame, C., Higgins, D. G. & Heringa, J. T-Coffee: a novel method for fast and accurate additional support from NIH grant DP1AI144248 (Pioneer Award) and from the Robertson
multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000). Foundation to G.D.V. and NIH grant R01DK116646 (Transformative Award) to D.M. C.R.N. is a
34. Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo Human Frontier of Science Program postdoctoral fellow. G.P.D. is a Robert Black Fellow of the
generator. Genome Res. 14, 1188–1190 (2004). Damon Runyon Cancer Research Foundation. A.S. is a Boehringer-Ingelheim Fonds PhD fellow.
35. Gupta, N. T. et al. Change-O: a toolkit for analyzing large-scale B cell immunoglobulin G.D.V. and D.M. are Burroughs-Wellcome Investigators in the Pathogenesis of Infectious
repertoire sequencing data. Bioinformatics 31, 3356–3358 (2015). Disease. G.D.V. is a Searle Scholar, a Pew-Stewart Scholar, and a MacArthur Fellow.
36. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome
Res. 19, 1639–1645 (2009). Author contributions C.R.N. and L.M. performed all mouse and antibody-sequencing
37. Pasqual, G., Angelini, A. & Victora, G. D. Triggering positive selection of germinal center B experiments, with help from T.A. C.W. and C.R.N. produced and assayed the reactivity of
cells by antigen targeting to DEC-205. Methods Mol. Biol. 1291, 125–134 (2015). monoclonal antibodies by ELISA, with help from A.S. G.P.D. stained monoclonal antibodies in
38. Palm, N. W. et al. Immunoglobulin A coating identifies colitogenic bacteria in cultured bacteria and carried out and dot and western blots. A.M.B. optimized and performed
inflammatory bowel disease. Cell 158, 1000–1010 (2014). flow cytometry of faecal bacteria. A.A.K.L. established the protein-free-diet protocol. L.M. and
39. Mouquet, H. et al. Polyreactivity increases the apparent affinity of anti-HIV antibodies by T.B.R.C. designed and performed all bioinformatics analyses. C.R.N., D.M. and G.D.V.
heteroligation. Nature 467, 591–595 (2010). conceptualized the study, designed all experiments, and wrote the manuscript with input from
40. Chao, A. Nonparametric-estimation of the number of classes in a population. Scand. J. all authors.
Stat. 11, 265–270 (1984).
41. Colwell, R. K. EstimateS: Statistical estimation of species richness and shared species Competing interests The authors declare no competing financial interests.
from samples. Version 9. http://purl.oclc.org/estimates (2013).
Acknowledgements We thank all members of the Victora and Mucida laboratories, past and 2865-9.
present, for assistance with experiments, fruitful discussions and critical reading of the Correspondence and requests for materials should be addressed to D.M. or G.D.V.
manuscript. In particular we thank: A. Rogoz and G. Fayzikhodjaeva for maintaining Peer review information Nature thanks Ramy Arnaout, Rachael Bashford-Rogers and Hai Qi for
gnotobiotic mice; S. Gonzalez for maintaining SPF mice; K. Gordon and K. Chhosphel for FACS; their contribution to the peer review of this work.
the Rockefeller University Genomics Center for RNA sequencing; and employees of the Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Clonal replacement in steady-state gaGCs. a, Gating of multiple images as exemplified in d (see also Fig. 1g). Data are from three to
strategy for PA-GFP mice used in Fig. 1a–d. GC, germinal centre. b, Gating five mice per time point at days 14–35, and one to two mice per time point for
strategy and efficiency of labelling in germinal centres of AID-Confetti mice day 7 and later times. f, Size of germinal centres in Peyer’s patches (PPs) versus
seven days after the administration of tamoxifen, as in Fig. 1e. The labelling mLNs, calculated from samples obtained seven days after tamoxifen treatment
efficiency is calculated as 100% minus the product of the percentage of as in d and Fig. 1f, plotted as the cross-sectional area of the largest available
unlabelled cells in the GFP/YFP, RFP and CFP channels. c, Gating strategy for the z-section. Each symbol represents one germinal centre. Lines show medians;
S1pr2CreERT2/tdTomato fate-mapping experiments shown in g and Fig. 1h. All P values are from two-tailed Mann–Whitney U tests. Data are pooled from
flow plots are representative of multiple experiments. d, Multiphoton images multiple mLNs and Peyer’s patches of two mice from two independent
of Peyer’s patches from SPF mice at different times after tamoxifen treatment experiments. g, Turnover of B cell clones in germinal centres of Peyer’s patches
(see Fig. 1f). Values in parentheses in images are NDS values. e, Quantification from S1pr2CreERT2 × Rosa26Stop-tdTomato mice (see Fig. 1h).
Article
Extended Data Fig. 2 | Binding characteristics of ‘winner’ gaGC clones from Fig. 6d. e, Flow-cytometry analysis of the binding of recombinant monoclonal
SPF mice. a, Gating strategy for isolating AID-Confetti single germinal centres antibodies to faecal bacteria isolated from SPF mice. Plots gated as in d. All
shown in b, c, Figs. 2a, 3d, e and Extended Data Figs. 6a, b, 8a. CR, CFP and/or plots are representative of data obtained from at least two separate
RFP; non CR, non-CFP, non-RFP, GY, GFP and/or YFP. b, c, Additional Igh experiments. f, Summary of the reactivity of SPF monoclonal antibodies,
sequence relationships among B cells from high-NDS germinal centres (b) and assayed by ELISA against food protein extracts, autoantigens (anti-nuclear
one low-NDS germinal centre (c) (see Fig. 2a). Scale bars, 50 μm. In c, each tree is antibody, ANA), and a five-antigen polyreactivity panel comprising
for a separate clone (defined as a unique V(D)J rearrangement). Only clones single-stranded DNA, double-stranded DNA, keyhole limpet haemocyanin
with more than five cells are shown (grey slices in pie charts). d, Gating strategy (KLH), insulin and LPS. Shown are background-subtracted OD 450 values. Data
for bacterial flow cytometry, performed in e, Figs. 2b, 3f, h, i and Extended Data representative of assays repeated in at least three separate experiments.
a Quantitative PCR, total 16S b Quantitative PCR, species-specific 16S c Quantitative PCR, species-specific 16S
100 50
SPF Species
10 Oligo-MM12 40 LOD Clostridium clostridioforme YL32
Clostridium innocuum I46
Ct value
30
1 Acutalibacter muris KB18
20 Akkermansia muciniphila YL44
0.1
10 Bacteroides caecimuris I48
Blautia coccoides YL58
0.01 0
Feces Cecum Enterococcus faecalis KB1
6S
6
8
9
KB 1
18
YL 2
27
YL 1
YL 2
YL 4
YL 5
58
Flavonifractor plautii YL31
I4
I4
I4
KB
YL
3
3
4
4
YL
l1
ta
Muribaculum intestinale YL27
To
Lactobaccilus reuteri I49
Turicimonas muris YL45
P F1
Extended Data Fig. 3 | Stable vertical transmission of the Oligo-MM12 In c, Ct values were used to quantify the relative abundance of each species
consortium. a–c, qPCR of total (a) and strain-specific (b, c) 16S DNA from (see Methods). LOD, limit of detection. F1 refers to the first generation after the
faecal samples of mice stably colonized with the Oligo-MM12 consortium. In parental strain (P, colonized by gavage). Note that Bifidobacterium animalis
a, ΔCt values were calculated in respect to a reference SPF sample, marked (YL2) is usually undetectable in faeces19.
by the black filled symbols, with which all other values were compared.
Article
Extended Data Fig. 4 | Frequency and isotype distribution of gaGCs in cells positive for the indicated surface BCR isotype in different organs of mice
germ-free and Oligo-MM12-colonized mice. a, Gating strategy for analysing raised under the indicated conditions. Data are from at least three mice per
the frequency of germinal centres and distribution of isotypes (results shown group, as in d. Data are presented as means ± s.e.m. d, Statistical analysis of
in b–d). b, Frequency of cells with the phenotype of germinal centres (CD38– selected isotypes and anatomical locations, using data from c. Each symbol
FAShi) among total B220+ B cells in the indicated organs of mice raised under represents one mouse. Lines indicate medians; P values are obtained from
the indicated conditions. Each symbol represents one mouse. SPF, n = 25; two-tailed Kruskall–Wallis tests carried out on each trio, with Dunn’s multiple
germ-free (GF), n = 16; Oligo-MM12, n = 11. c, Frequency of germinal-centre B comparisons post-test. All P values below 0.05 are reported.
a Individual GC gating strategy Oligo-MM12 mLN, day 21 post-tamoxifen Peyer’s patch, day 21 post-tamoxifen
All cells Lymphocytes Single cells B cells

250K 250K 5 4
10 10
Lymphocytes
TCRβ APC-Cy7
200K 200K Single Cells 4
33.8 10 B cells
FAS PE-Cy7
98.2 3
150K 150K 86.4 10
3 GC
100K 100K 10 4.9
SSC-A
FSC-H
0 0
50K 50K
-103
0 0
0 100K 200K 0 100K 200K 4 4
0 10 0 10
FSC-A FSC-A B220 BV421 CD38 APC
b AicdaCreERT2/+.Rosa26Confetti/Confetti (day 7 post-tamoxifen), individually dissected GCs
GF1 GF2 GF 3 GF4 GF5

J J J C C J J C C C J C J J J f Oligo-MM12 g Oligo-MM12 day 20-23
80
Colored cell Color Density x dominance PSPFxMM12 = 0.055
70 density dominance (density > 0.4 only) PGFxMM12 = 0.208
Number of cells sequenced
60 1.0 100 1.0 100
Normalized dominance score

Fluorescent cells/100 µm2
% of fluorescent B cells
50 0.8 80 0.8
% GCs > NDS 0.75

75
Mesenteric LN
40
0.6 60 0.6
30 50
0.4 40 0.4
20
25
10 0.2 20 0.2
0 0.0 0 0.0 0
Dominant clone Expanded clones Singletons
2
F
F
M1
-1
-2
-1
-2
-1
-2
SP
G
14
20
14
20
14
20
M
Colored cell Color Density x dominance PSPFxMM12 = 0.015
c Clonal Clonal Size of d density dominance (density > 0.4 only) PGFxMM12 > 0.999
richness diversity largest clone P < 0.0001 100 100
D50 (fraction of clones accounting
103 0.4 100 100 1.0 1.0
Normalized dominance score

% GCs with largest clone > 50%
Fluorescent cells/100 µm2

for 50% of sequenced cells)
% of fluorescent B cells
80
Clones per GC (Chao1)
80 0.8 0.8 75
% of cells in the GC
%GCs > NDS 0.5

Peyer’s patches
0.3 75
102 0.6 60 0.6
60
50
0.2 50 40
0.4 0.4
40
101 25
0.1 25 0.2 20 0.2
20
0.0 0 0.0 0
100 0 0 0
2
8
F
M1
mLN GCs mLN GCs mLN GCs
-1
-2
-1
-2
-1
-2
SP
G
14
20
14
20
14
20
(P PF
lic GF
M
)
A)
es
S
Days post-tamoxifen
(s
Extended Data Fig. 5 | Clonal selection in germ-free and Centre bars represent the proportion in the sample; error bars show the exact
Oligo-MM12-colonized mice. a, Gating strategy for germ-free AID-Confetti binomial 95% confidence interval. e, Multiphoton images of Oligo-MM12 mLNs
single germinal centres used in b–d. b–d, Sequencing of Igh genes from B cells and Peyer’s patches at different times after treatment with tamoxifen. Blue
obtained from individual mLN germinal centres. Germinal-centre B cells were represents collagen (second harmonics); white shows autofluorescence; other
single-cell-sorted from fragments of vibratome slices containing single colours are from the Confetti allele. Scale bars, 200 μm (overviews), 50 μm
germinal centres. To avoid biased selection of germinal centres based on NDS (close-ups). N/D, NDS not determined owing to a low density of coloured cells.
or loss of germinal centres with a low density of coloured cells, mLNs were f, Quantification of images as in e for mLNs (top) and Peyer’s patches (bottom).
harvested at five to seven days after treatment with tamoxifen, before Each symbol represents one germinal centre. Medians are indicated. Only
extensive selection or clonal turnover; both fluorescent and non-fluorescent germinal centres with a density of more than 0.4 fluorescent cells per 100 μm2
cells were included in the sample. This unbiased selection ensures that data are are included in the NDS calculations. g, Proportion of germinal centres with
comparable to those obtained using in situ photoactivation (Fig. 1a–d), which NDS values of more than 0.75 in mLNs (top) and more than 0.5 in Peyer’s patches
we could not perform because the photoactivatable GFP-transgenic strain is (bottom) under SPF, germ-free and Oligo-MM12 conditions at 20–23 days
not available under germ-free status. b, Clonal composition of individual after tamoxifen; SPF and germ-free data are as in Fig. 3c. For SPF, Oligo-MM12
germinal centres from five mice (GF1–GF5). C, caecal-colonic mLN; J, jejunal and germ-free mLN gaGCs, n = 57, 16 and 27, respectively; for gaGCs from
mLN. c, Quantification of data from b. Each symbol represents one germinal Peyer’s patches, n = 20, 10 and 9, respectively. P values obtained by two-tailed
centre. d, Proportion of germinal centres in which the largest clone accounts Fisher’s exact tests. Error bars represent exact binomial 95% confidence
for more than 50% of all B cells in mLNs of SPF mice (data from Fig. 1b) and intervals. All data are from three to five mice per time point.
germ-free mice (data from b). P values are from two-tailed Fisher’s exact tests.
Article
a Germ-free UA Oligo-MM12
UA
M220.U
M220
13
7 RFP
2 3 2 2 3 n n cells with
2 identical
2 sequence
2 12
3 Inferred
mLN mLN precursor
86/88 24/44 1 mutation
c GF mAbs: ELISA panel d GF mAbs, flow cytometry on SPF fecal bacteria e GF mAbs, ELISA on SPF fecal bacteria f GF mAbs, WB on SPF ileum protein extract
-)
G082 2.0
9 +)
82 (
G 053
ED ry
3H 38(
G196 G200 2.0
2
G 6
G 8
G 0
G 2
G 4
G 6
G 8
G 6
M 8
2ndary
a
Abs. 450 nm
GF mAbs (n = 9)
23
19
16
20
20
20
20
20
22
22
2 nd
G
G198 1.5
G
G202 ED38
G200 1.5
Abs. 450 nm
1.0 MG053 (- ctrl.) MG038
G202
G204 G204
0.5 1.0
G206 G082
0 G206
G208 0.5
G226 G196 G208
ED38(+) 0.0
G198 G226 10-1 100 101 102 103
MG053(-)
5 5
0 10 3 10 4 10 0 10 3 10 4 10
mAb concentration (nM)
od
ss NA
ds NA
A
In S
lin
H
Anti-human IgG1 Alexa Fluor 647

N
LP
KL
su
Fo
A
D
D
Extended Data Fig. 6 | Characteristics of ‘winner’ gaGC clones from e, ELISA analysis of the binding of monoclonal antibodies from germ-free mice
germ-free and Oligo-MM12-colonized mice. a, b, Additional Igh sequence to faecal bacterial fractions from SPF mice. MG053 was assayed at three
relationships among B cells from high-NDS germinal centres of germ-free (a) dilutions only. Other monoclonal antibodies were assayed at dilutions
and Oligo-MM12-colonized (b) mice. Details are as in Fig. 2a. Scale bars, 50 μm. indicated on the x-axis. Lines show the means of two assays. f, Western blot
c, Reactivity summary of germ-free monoclonal antibodies assayed by ELISA (WB) analysis of the binding of monoclonal antibodies from germ-free mice to a
against food protein extracts, autoantigens (anti-nuclear antibody, ANA), and a protein extract from mouse ileum tissue, run on a single-well 4–15% gel and
five-antigen polyreactivity panel. Shown are background subtracted OD450 blotted using a multiwell mask. Monoclonal antibody 3H9 is a DNA-specific
values. d, Flow-cytometry analysis of the binding of monoclonal antibodies negative control. Data in c–f are representative of two or more independent
from germ-free mice to faecal bacteria from SPF mice. Details are as in Fig. 2b. experiments.
a b VH1-47 VH1-12
Cell # Cell #
9
Replacements Replacements
45
17
19
31
87
51
59
17
10
9-A . PTPP. 10-E DD.
ARRSN(Y/F)/12 10-E AD. DDD 30-T I . A
11-L . Q. V V I 31-S NNN
Clonotype VH1-47 34-M I VI
12-V . . L . L M
ARGSNY/12 35-H
13-K N. RER. . YY
JH 4 16-A . T T DD. 37-V I . I
19-K . MR R . . 41-P T. S
ARGGFY/11
23-K . R. ER. 43-Q E. K
JH2 28-T . . I I NS 50-A GGV
33-P . R. SF . 52-Y F. H
ARGSNY/11 ARGSSY/11 34-I . V ML . M 55-N NDD
Targeted AA
35-E D. . HD. 58-T A. N
ARGTNY/12 37-M . I L I V. 60-Y N. F
38-K RR. R. . 62-Q P. H
ARGSNF/(11/12) 39-Q . RR. HR 63-K R. Q
40-N . SCSS. 66-G DDD
41-H . L . L P. 67-K M. R
42-G EEEER. 72-V . I A
ARGSNY/12 43-K . EEER. 77-S C. N
50-N SS. SSD 78-T . KK
54-Y . FFFSS 79-A . VV
ARGSNY/12 55-N SDDDDD
0.01 80-Y F. F
57-D EANEE. 83-L
Targeted AA
F VF
58-T I AI AAI 84-S N. T
59-K . NNNNQ
88-S F. F
60-Y . CC. F C
90-D E NE
61-N . S. SDD
94-Y S. S
62-E . . . ADD
63-K MN E N N N
G 5
M F7
12
M1
F
65-K ERRRR.
G
Clonotype VH1-12 66-G . ADDAD
Mouse
AREGFVY 67-K RR. RRR
69-T . S. AA.
VREGFAY 70-L V . . MMM
72-V . . . I AA
73-E . DAD. D >50
Frequency %
74-K . . I . . R 40
TREGFAY 77-S CNNNNN 30
78-T . . . SSK 20
AREGFTY Mouse:
AREGFAH 80-Y . F SNF F 10
AREGFAF GF 1 82-E . DA DGD 0
GF 2 83-L . V. V. V
2
F
M1
G
0.01 GF 3 84-S . C. GG.
M
GF 5 86-L . F. VI S
GF 7 87-T . I . I AI
92-A . . . GDV
MM12 1 93-V . I I I I I
MM12 2 94-Y . C. F F .
95-Y . FFFFF
G 1
G 2
F3
G 5
M F7
32
F
F
F
M1
G
Mouse
Extended Data Fig. 7 | Mutational patterns in germ-free/Oligo-MM12 public clone were included in the analysis. The number of cells analysed per mouse is
clonotypes. a, Dendrograms showing the sequence relationships between indicated at the top of each column. Only those amino acids mutated in at least
VH1–47 and VH1–12 clones in different mice. All clones with up to two-amino-acid three (VH1–47) or two (V H1–12) mice are listed on the left, using Immunogenetics
differences from the public-clonotype CDR H3 motifs are included. b, Heat (IMGT; http://www.imgt.org) numbering; to the right, the most frequent
maps showing the frequency of amino-acid replacements along the VH1–47 and amino-acid replacement in each mouse is given. Arrows indicate recurrent
VHH1–12 families in germ-free (blue) and Oligo-MM12 (green) mice, using the amino-acid mutations found in five of six mice (VH1–47) or three of three mice
same data as in Fig. 4b. Only mice with more than two cells within the specified (VH1–12).
Article
Oligo-MM12 -colonized mouse
VH1-12/AREGFAY/JH3
UA
Cecal mLN
16 PP (M232) Ileal mLN
2 3 4 6 3
16 31
2
12 (M228) 27
3
CFP YFP
3 3
RFP CPF/YFP
n n cells with identical
sequence
Inferred precursor
1 mutation
PFC 1 PFC 2 PFC 3
c
D I C PP I C PP C
b GC frequency 80 VH1-47/ARGSNY.../JH4
VH1-47/ARGSNY.../JH2
10 30 70
SPF VH1-47/ARGSNY.../JH3
Cells sequenced
GF 60 Other VH1-47 H3
8
PFC
50 VH1-12/AREGFAY/JH3
20
% of B cells
6 Other VH1-12
40
Other V segments
4 30
10
20
2
10
0 0 0
D mLN J mLN I mLN C mLN pPP dPP
Whole organs (WT)
Extended Data Fig. 8 | Stereotypical germ-free IgH clonotypes are present organ of origin of cells with that particular sequence. b, Frequency of cells with
in Oligo-MM12 and germ-free/dietary-protein-free conditions. a, Massive a germinal-centre phenotype (CD38dim FAShi) among total B220+ B cells in the
expansion of a public VH1–12 clonotype across different secondary lymphoid indicated organs of mice raised on protein-free chow (PFC). Data for SPF and
organs of mouse MM12 1 (from Fig. 4b), at 21 days after tamoxifen treatment. germ-free mice are reproduced from Extended Data Fig. 4b. Each symbol
Multiphoton images show all three germinal centres sequenced from this represents one mouse. For PFC, n = 8 mice. c, Clonal distribution of germinal-
mouse (yellow dotted boxes), magnified in the side panels. Scale bars, 200 μm centre B cells sequenced from the indicated tissues of three separate mice
(overviews) and 50 μm (close-ups). mLN close-ups are from different image (PFC1–3), with public clonotypes colour-coded. See also Fig. 4b. C, caecal
acquisitions of the same germinal centre. A clonal tree of all cells from this colonic mLN; D, duodenal mLN; I, ileal mLN; PP, Peyer’s patch.
clone is shown at the bottom right. Arrowheads indicate clonal bursts and the
Article
Extended Data Fig. 9 | Multiwell incidence-based Igh sequencing reveals Each symbol represents one well. Boxes represent medians and interquartile
clonal overlap among individual mice and between microbial colonization ranges. As expected, non-germinal-centre B cell samples had many more total
conditions. a, Overview of the incidence-based Igh sequencing method used clones per well than did germinal-centre B cells. d, Proportion of expanded
for c–g and Fig. 4c, d. To identify expanded public clonotypes among gaGC clones (present in more than one well per sample) in germinal-centre and non-
samples from multiple mice with high confidence, we developed an incidence- germinal-centre samples from mLNs and Peyer’s patches of mice held under
based sequencing strategy based on repeated sampling of the same germinal- the specified conditions. e, Histograms showing Levenshtein distances
centre B cell population. We sorted multiple samples of 100 germinal-centre B between the indicated consensus CDR H3 sequence and the CDR H3 sequence of
cells (usually 32 for mLN and 16 for Peyer’s patches) from 6 germ-free, 6 SPF, and all clones in the indicated category. For ARGSNYXXXXDY, distances are plotted
7 Oligo-MM12-colonized mice, and sequenced all BCRs in each sample, for a for clones carrying the ‘correct’ VH1–47 gene or two ‘control’ VH regions with
total of roughly 80 thousand input B cells, plus 32 wells each of non-germinal- similar usage frequency in our sample. P values were obtained by Kruskall–
centre B cells from the mLN of 3 germ-free and 3 SPF mice as controls. To avoid Wallis test comparing all three conditions. Owing to the very low number of
counting as ‘public’ sequences that were spuriously present in different mice total VH1–12 clones outside of the germ-free condition, distances to the
owing to barcode misassignment or DNA contamination, we included in our AREGFAY CDR H3 are compared between VH1–12 clones and all clones. P values
analysis only those clones that were represented by more than five reads in obtained by two-tailed Mann–Whitney U test. f, Fraction of clone*wells
any single well and found in at least two wells from the same sample. Key containing public clonotypes in each condition, pooled from all mice. P values
bioinformatics steps are described in the figure; see Methods for a full were obtained by Fisher’s exact test. g, Venn diagram showing the number of
description of the bioinformatic pipeline. b, Gating strategy used for data in clones per condition (pooled from all mice) and overlap between conditions.
c–g and Fig. 4c, d, described in a. c, Number of distinct clones per well, after The clone in the centre of the graph (SPF/Oligo-MM12/germ-free overlap)
collapsing sequences with matching VH, JH, and CDR H3 nucleotide sequences. corresponds to the VH1–47 public clonotype. In f, g, data are as in Fig. 4d.
Gabriel Victora
Corresponding author(s): Daniel Mucida
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection BD FACSDiva software v8.0.2 was used for flow cytometry data acquisition. Sequencing data was acquired using the Illumina MiSeq
platform. Microscopy was acquired on the Olympus FluoView v4.2 acquisition software.
Data analysis Data was analysed using FlowJo (TreeStar) v8.7 and v10.5.3, Prism (GraphPad) v8.3.0, ImageJ v1.51 and R v3.6.3. Sequencing analysis was
carried out using PANDASeq v.2.11, HighVQUEST v. 1.6.9, VBASE2, FASTX Toolkit v0.0.13, Change-O v0.4.6, the T-coffee algorithm
(Notredame et. al. 2000) and GCtree v1 (deWitt et. al. 2018). Circular ideogram plots were created using Circos v. 0.69-9. Dendograms
were generated using clustalx v2.1 and FigTree v1.4.4. Data was presented using Adobe Illustrator v23.0.4 and 15.1.0.
Data
October 2018
Single cell sequences are available as a supplementary spreadsheet. These refer to Figures 2, S2, 3, 4, 5 and S6 as labeled in the spreadsheet. Incidence-based
sequencing data is available at https://github.com/victoraLab/MIBS.
1

Sample size No statistical method were used to determine sample size. Each experiment was repeated across multiple animals.
Data exclusions Data only excluded for technical reasons. In bacterial flow cytometry data was excluded if the background secondary antibody binding was too
high and no mAb binding could be observed (experiments appear in figures 2, S2 and 4.)
Replication All experiments were reproducible and were repeatable as detailed in figure legends. Sequencing in Figure 5c-g was carried out once, but on
a large sample size (6 or 7 animals per group in 3 groups).
Randomization Mice of either sex were used for most studies. For sequencing experiments in Figure 5c-g age and sex matched mice were used. Mice were
allocated into groups based on genotype and colonization status (not randomized)
Blinding Investigators were not blinded to group allocation; all of the measurements reported objectively quantifiable.


Antibodies ChIP-seq
Clinical data
Antibodies
Antibodies used B220- BV421, BioLegend, #103240, RA3-6B2, Lot: B288312, 1:200 dilution
B220- BV605, BioLegend, #563708, RA3-6B2, Lot: 8201934, 1:200 dilution
B220- BV711, BioLegend, #103255, RA3-6B2, Lot: B267109, 1:200 dilution
CD38- APC, BioLegend, #17-0381-82, 90, Lot: 2068810, 1:200 dilution
CD38- PerCP-Cy5.5, BD, #562770, 90, Lot: 9112983, 1:200 dilution
FAS- PE-Cy7, BD, #557653, Jo2, Lot: 9039631, 1:400 dilution
FAS- BV421, BD, #562633, Jo2, Lot: 9029848, 1:400 dilution
CD16/32 (Fc block)- Bio-X-Cell, BE0307, 2.4G2, 1:200 dilution
TCR β- APC-e780, invitrogen, #47-5961-82, H57-597, Lot: 2114197, 1:200 dilution
CD3- BV785, BioLegend, #100232, 17A2, Lot: B277518, 1:200 dilution
CD4- BV785, BioLegend, #100552, RM4-5, Lot: B264992, 1:200 dilution
CD8- BV785, BioLegend, #100749, 53-6.7, Lot: B258589, 1:200 dilution
Nk1-1- BV785, BioLegend, #108749, PK136, Lot: B279624, 1:200 dilution
IgM- PE-Cy7, eBioscence, #25-5790-81, Il/41, Lot: 2039912, 1:200 dilution
October 2018
IgG2b- AF488, Southern Biotech, #1090-30, Lot: A2513-X665G, 1:500 dilution

IgA- Biotin, Southern Biotech, #1040-08, Lot: I5613-P366E, 1:500 dilution
IgA- PE, eBioscience, #12-4204-83, mA-6E1, Lot: E01650-1634, 1:200 dilution
IgG1- APC, BioLegend, #406610, RMG-1, Lot: B247242, 1:200 dilution
Streptavidin- APC-e780, Invitrogen, #47-4317-82, Lot: 2005846, 1:200 dilution
goat anti-human IgG1- Alexa Fluor 647, Thermo Fisher, #A-21445, Lot: 1962791, 1:2,000 dilution
goat anti-human IgG (Fcɣ specific)- HRP, Jackson Immuno Research, #109-035-098, Lot: 13741, 1:1,000 dilution
MAbs produced in-house:
2
CD35, clone 8C12
S078.U

S078
S080
S116
S118
S120.U
S120
S210
S212
G082
G196
G198
G200
G202
G204
G206
G208
G226
M216.U
M216
M218.U
M218
M220.U
M220
M222
M224
M228
M232
Validation All fluorescent antibodies validated as described on the manufacturers website. HRP-conjugated antibodies validated in-house
by ELISA measuring full length IgG1 antibody concentration of commercially purchased standards. mAbs produced by us in this
study were validated by SDS-PAGE, ELISA, spectrophotometry (nanodrop) and bio-layer interferometry (Octet Red 96) to ensure
proper expression, folding and concentration.

Cell line source(s) ATCC, HEK 293F cells
Authentication Cell lines were not authenticated; validation of functionality was established measuring the quantity and quality of the
produced antibody.
Mycoplasma contamination All cell lines tested negative for mycoplasma contamination.
Commonly misidentified lines None used.


Laboratory animals Mice both sexes were used in all studies. 8-12 week old mice were used in all studies and strains.
Rosa26.Confetti (013731) and Rosa26.Stop.TdTomato (007914) mice were from The Jackson Laboratory. AicdaCreERT2 mice
were provided by Claude-Agnès Reynaud and Jean-Claude Weill (Institut Necker). S1pr2CreERT2 BAC transgenic mice provided
by T. Okada (RIKEN Yokohama) and T. Kurosaki (U. Osaka). PA-GFP mice were generated by G. Victora and M. Nussenzweig
(Rockefeller University).
Wild animals The study did not involve wild animals.
Field-collected samples This study did not involve samples collected from the field.
October 2018
Ethics oversight All animal procedures were approved by the Institutional Animal Care and Use Committee of the Rockefeller University.
3
Flow Cytometry

Plots
Confirm that:
The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.
Methodology
Sample preparation Cells were isolated by maceration with disposable micropestles (Axygen) in 100 μl of PBS supplemented with 0.5% BSA and 1
mM EDTA (PBE), and single cell suspensions obtained by two passes through a 70 μm mesh. Cells were stained with fluorescently
labeled antibodies on ice for 30 minutes.
Instrument Samples were run on a FACS LSRII or FACS Symphony (BD). For cell sorted samples were run on a FACS ARIA (BD).
Software BD FACSDiva software v8.0.2 was used for flow cytometry data acquisition. Analyzed using FlowJo software package (Tri-Star,
USA) v10.5.3 and v8.7.
Cell population abundance Most cells were single cell index sorted into 96-well PCR plates, with single-cell precision. For bulk sorting, 100 cells per well were
sorted with single-cell precision.
Gating strategy All positive and negative populations were determined by compensation with single color controls. For sorting and analysis, all
lymphocytes were first gated based on SSC-A vs FSC-A, followed by 2 singlet gates (FSC-H vs FSC-A and SSC-H vs SSC-A). For GC
gating, cells were gated on either TCRbeta or Dump-, B220+, CD38-, Fas+ and interrogated for IgM, IgG1, IgG2b or IgA. For AID-
confetti sorting experiments cells were gated on SSC-A vs FSC-A in the same way as GC cells. TCRbeta-B220+CD38-Fas+ cells (GC
B cells) were then plotted as follows: CFP vs RFP, GFP vs YFP and all colored GC cells were single-cell index sorted. For PA-GFP
sorting experiments cells were gated as above for GC cells, then GFP+ (photoactivated) GC cells were single-cell index sorted. For
GF single GC experiments (AID-Confetti), cells were gated as described, but fluorescent and non-fluorescent TCRbeta-B220
+CD38-Fas+ cells were index sorted. For non-fluorescent sequencing analysis, cells were gated as above for GC cells and single-
cell index sorted. For bacterial flow cytometry, cells were gated on SSC-A vs FSC-A with only far outliers removed. Then, SYTO
+DAPI- live bacteria were assayed for mAb binding.
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.
October 2018
4
Article
Receptor binding and priming of the spike

protein of SARS-CoV-2 for membrane fusion
https://doi.org/10.1038/s41586-020-2772-0 Donald J. Benton1,6 ✉, Antoni G. Wrobel1,6 ✉, Pengqi Xu2,3, Chloë Roustan4, Stephen R. Martin1,
Peter B. Rosenthal5, John J. Skehel1 & Steven J. Gamblin1 ✉

Infection with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is
Published online: 17 September 2020
initiated by virus binding to the ACE2 cell-surface receptors1–4, followed by fusion of
Check for updates
the virus and cell membranes to release the virus genome into the cell. Both
receptor binding and membrane fusion activities are mediated by the virus spike
glycoprotein5–7. As with other class-I membrane-fusion proteins, the spike protein is
post-translationally cleaved, in this case by furin, into the S1 and S2 components that
remain associated after cleavage8–10. Fusion activation after receptor binding is
proposed to involve the exposure of a second proteolytic site (S2′), cleavage of which
is required for the release of the fusion peptide11,12. Here we analyse the binding of
ACE2 to the furin-cleaved form of the SARS-CoV-2 spike protein using cryo-electron
microscopy. We classify ten different molecular species, including the unbound,
closed spike trimer, the fully open ACE2-bound trimer and dissociated monomeric S1
bound to ACE2. The ten structures describe ACE2-binding events that destabilize the
spike trimer, progressively opening up, and out, the individual S1 components. The
opening process reduces S1 contacts and unshields the trimeric S2 core, priming the
protein for fusion activation and dissociation of ACE2-bound S1 monomers. The
structures also reveal refolding of an S1 subdomain after ACE2 binding that disrupts
interactions with S2, which involves Asp61413–15 and leads to the destabilization of the
structure of S2 proximal to the secondary (S2′) cleavage site.
Recognition of the ACE2 receptor by the membrane spike glycoprotein resolve ten distinct species of spike and spike–ACE2 complexes (Fig. 1
of SARS-CoV-2 is a major determinant of virus infectivity, pathogenesis and Extended Data Fig. 1), ranging from tightly closed, unbound trim-
and host range. Previous structural studies on the spike glycoproteins of ers to open trimers that formed complexes with three ACE2 molecules
coronaviruses6,16–22 have shown that the spike trimer consists of a central and dissociated monomeric S1–ACE2 complexes. Of the spike trimers
helical stalk—comprising three interacting S2 components—that is cov- analysed, two thirds were bound to ACE2 (Extended Data Fig. 1). Of
ered at the top by S1. Each S1 component consists of two large domains, the unbound species, we observe good-quality particles in the closed
the N-terminal domain (NTD) and receptor-binding domain (RBD), each unbound conformation, equally compact to those reported in our
associated with a smaller intermediate subdomain. In virus membranes, previous study26 and slightly more so than those described in previous
spike glycoproteins exist in a closed form, in which the RBDs cap the reports6,16. There are also considerable numbers (16% of all trimers)
top of the S2 core and are inaccessible to ACE2, and in an open form, of unbound particles with one erect RBD, as well as some (4%) in an
in which one S1 component has opened to expose the RBD for ACE2 intermediate conformation, a less-compact closed form, with a single
binding6,16,18,23. Recent structural studies7,24,25 on the isolated RBD of disordered RBD, which have also been reported in a previous study of
the SARS-CoV-2 spike protein in complex with ACE2 have provided the furin-cleaved spike protein26.
a molecular description of the receptor-binding interface. Although Of the spike trimers bound to the receptor, half accommodate one
some comparisons can be inferred from the previous cryo-electron ACE2 receptor. As previously reported for the SARS-CoV spike pro-
microscopy studies on the spike protein of SARS-CoV12,18,19,23, structures tein12,23, the ACE2-bound RBD occupies a range of tilts with respect to
of intact trimeric SARS-CoV-2 spike with bound ACE2 are needed to the long axis of the trimer (Extended Data Fig. 2a). Of the two RBDs per
determine the effects of binding on the overall spike conformation. trimer that are not engaged with the receptor, either both are closed or
To examine this interaction between the SARS-CoV-2 spike protein one of the RBDs remains closed and one (either clockwise or anticlock-
and its receptor, we mixed the ectodomains of furin-cleaved spike wise to the bound S1 (Extended Data Fig. 1)) is in the open conformation.
with the ectodomains of ACE2 and incubated them for around 60 s We were also able to identify, reconstruct and refine trimers to which
before plunge-freezing the mixture in liquid ethane for examination two or three ACE2 receptors were bound, in successively more open
by cryo-electron microscopy. In the images that we obtained, we could structures (Fig. 1 and Extended Data Fig. 1).
1
Strutural Biology of Disease Processes Laboratory, Francis Crick Institute, London, UK. 2Precision Medicine Center, The Seventh Affiliated Hospital, Sun Yat-sen University, Shenzhen, China.
3
Francis Crick Institute, London, UK. 4Structural Biology Science Technology Platform, Francis Crick Institute, London, UK. 5Structural Biology of Cells and Viruses Laboratory, Francis Crick
Institute, London, UK. 6These authors contributed equally: Donald J. Benton, Antoni G. Wrobel. ✉e-mail: donald.benton@crick.ac.uk; antoni.wrobel@crick.ac.uk; steve.gamblin@crick.ac.uk

Article
Fig. 1 | Sequential steps in ACE2 binding of the
Monomeric SARS-CoV-2 spike protein. Surface representation
S1–ACE2 Closed of the spike, with monomers coloured in blue, rosy
brown and gold, and ACE2 coloured in green. Each
Open step shows two views of the spike complexes: a trimer
One erect RBD axis vertical view (left) and an orthogonal top-down
view along the axis (right). Clockwise from the top, we
show structures for closed, open but unbound RBD,
followed by sequential ACE2-binding events until
Three bound reaching the fully open, three-ACE2-bound spike
ACE2 protein state. From this final trimeric species, we
show dissociation into monomeric S1–ACE2, which
ACE2 may also occur for the one- or two-ACE2-bound
species.
ACE2
Two bound
One bound
One bound
One erect RBD
Comparison of the trimers with one erect RBD that is either bound approximately 5.5 Å away from the trimer axis, the NTD-associated and
or unbound by an ACE2 receptor revealed two things. First, ACE2 RBD-associated subdomains of the same monomer shift around 1.9 Å
binding alters the position of the open RBD by a rigid-body rotation and about 2.3 Å, respectively (Extended Data Fig. 2c), and at the same
of the domain that moves its centre of mass on average a further time the NTDs of all three S1 components move by around 1.5–3.0 Å
b a
R634
Y837 S2
Y636 F318
P295
S1
W633 K854
D614
RBD
Putative FP
R815 NTD
8Å
Closed S1
c
R634
W633 F318 S2
Unfolded
827–855 3Å
Y636 P295
D614
S2
Putative FP
R815
ACE2-bound S1
Fig. 2 | Structural rearrangements between the closed and the ACE2-bound moiety of the S2 chain that it interacts with (in red) in the closed conformation
states of the spike protein. a, Surface representation of a monomer of S2 in of the spike. Essential residues that participate in the interaction are labelled;
the one-ACE2-bound, two-RBD-closed state coloured in light pink with the S1 of particular note is the salt bridge between Asp614 (S1, chain A) and Lys854
subunit of the adjacent monomer in ribbon representation; the S1 of the (S2, chain B). c, Ribbon representation of the same intermediate domain as
one-ACE2-bound, two-RBD-closed state is shown in green and the three-RBD- in b, but in the conformation observed in the ACE2-bound structure of the spike
closed state (PDB 6ZGE 26) is shown in blue. The atoms on the surface of S2 that (in green), in which the movement and refolding of the domain leads to a loss of
contact the S1 intermediate domains are coloured in red. The arrows indicate interaction with S2, which becomes disordered. The putative fusion peptide
the direction of movements of the intermediate domains, and of the RBD, (FP) and the S2′ site of the second protease cleavage at R815 adjacent to the
between the closed and ACE2-bound conformations of the spike. b, Ribbon region that undergoes unfolding are shown in dark red.
representations of the NTD-associated intermediate domain in blue and the

S1–ACE2
S1–ACE2
dissociated
trimer
monomer
Fig. 4 | ACE2-bound S1 subunit as a part of the spike trimer and as an isolated

monomer. Space-filling representations of the spike protein with one monomer
coloured polychromatically. NTD, yellow; NTD-associated subdomain, blue;
RBD-associated subdomain, pink; RBD, rosy brown; S2, red; ACE2, green. The
remainder of the trimer on the left is coloured grey. The structure on the right is
aligned on the RBD:ACE2 moiety of the trimer complex on the left. The arrow
indicates the direction of movement of the NTD and NTD-associated subdomain
on the transition from the trimer (left) to the monomer species (right).
Closed Three ACE2 bound

similar but of poorer local resolution) with the fully closed trimer, the
Fig. 3 | Structural basis of S2 unsheathing by ACE2 binding. The spike RBD-associated intermediate subdomain moves about 8 Å, whereas
protein is shown as a space-filling representation for S1, with each monomer the NTD-associated intermediate subdomain moves by 3 Å (Fig. 2a).
coloured blue, rosy brown and gold, and as a ribbon representation for S2 The latter also undergoes a partial restructuring with possibly impor-
coloured in red for all three monomers. Left, top-down and side-on views of the tant implications for the mechanism of fusion activation of spike. In
trimer in the closed conformation. Right, the same views for the fully open the closed form, one edge of the NTD-associated intermediate subdo-
three-ACE2-bound species.
main interacts with a short helix and a loop from S2 of the neighbour-
ing monomer (Fig. 2b). Notably, two components of this interaction
(Extended Data Fig. 2d). Similar changes in the domain orientation are comprise a series of side-chain π-stacking interactions in the closed
observed in the recent structure of the SARS-CoV-2 spike complex with structure26: Tyr636, Phe318 and Arg634 of S1 with Tyr837 of S2; and
C105 Fab27 (Extended Data Fig. 2e), which binds at the ACE2-binding site. a salt bridge formed by Asp614 of S1 with Lys854 of S2. By contrast,
However, the molecular basis of both of these sets of changes remains in the ACE2-bound form, Tyr636, Phe318 and Trp633 refold to the
unclear. Binding of more than one ACE2 molecule does not induce side of the domain further away from the symmetry axis (as viewed in
any substantial further changes in the average positioning of the RBD Fig. 2c), leaving a channel to accommodate a new segment of α-helix
(Extended Data Fig. 2e). Second, our data suggest that ACE2 binding that forms downstream of Asp614 from polypeptide chain that was
favours the open conformation of the RBD. The relatively high-affinity previously disordered. As a consequence, the interactions between S1
interaction of RBD with ACE2 generates an RBD–ACE2 structure that and S2 described above for the closed form are lost in the ACE2-bound
cannot be accommodated in a closed trimer—the bound state does not form and the segment comprising residues 827–855 of S2 becomes
have access to the closed conformation. In addition, the fact that ACE2 disordered (Fig. 2c). This part of S2 is immediately C-terminal to the
binding induces a more-open conformation of the spike RBD suggests putative fusion peptide of S211, the N terminus of which is defined by
that some of the binding energy is used to drive the new conformation Arg815 at the S2′ cleavage site9,11. The opening of the ACE2-stabilized
of S1, which is then further excluded from a closed state. S1 therefore leads to the destabilization of the S2 structure just after
The successive steps, from closed unbound trimer to the fully open, the putative fusion peptide, potentially activating it for exposure in
three-ACE2-bound trimer, are associated with a substantial reduction the next stages of membrane fusion. Notably, Asp614, which forms salt
in the contact area that each S1 makes with both its neighbouring S1 bridges to Lys854 of S2 in the closed form, is frequently substituted13–15
monomers and with the S2 trimeric core (Extended Data Table 1). For by a glycine residue and it has been suggested that this substitution
the fully, three-ACE2-bound species, each S1 makes 1,400 Å2 less con- reduces shedding of S1 (and increases the number of spike proteins
tact with both its S1 trimer neighbours and 1,300 Å2 less contact with on the virus surface)13. We also propose that this substitution would
the S2 core than in the fully closed trimer conformation; all of these remove a key salt bridge, and that the unique stereochemistry available
rearrangements are driven by the energetics of the three ACE2-binding to glycine may facilitate the formation of the new segment of α-helix,
events. The movements of the RBD and NTD domains of S1 that are which is also incompatible with the S2 interaction. Furthermore, it could
associated with the opening of the structure and stabilization of the lead to reduced stability of the closed form of the spike protein, which
new arrangement by ACE2 binding, as described above, leave a trim- in turn would increase the likelihood of the RBDs adopting the open
eric ring of S1 molecules that are attached to the S2 core only through conformation and hence the ability of the spike protein to bind to ACE2.
contacts with its two small intermediate subdomains (Fig. 2a). Com- The opening up, and out, from the trimer axis of the S1 domains
paring the ACE2-bound, open form (the open-unbound structure is after ACE2 binding gives rise to an unshielding of the top surface of

Article
the helix–loop–helix (approximately residues 980–990 within the open form, is lost. We suggest that in this form the S trimer is primed
HR1 region20,22,28,29) at the top of the S2 domain (Fig. 3). In the closed for the helical rearrangements of S2 that are required for fusion of
form, these helices and their connecting turns are tightly shielded the viral and host cell membranes28.
by the RBDs; each S2 monomer is predominantly covered by its
anticlockwise-related S1 trimer neighbour. In the fully open state, the
S1 domains move in such a way as to generate a cavity with a diameter Online content
of 50 Å around the trimer axis that is about 65 Å deep. At the bottom Any methods, additional references, Nature Research reporting sum-
of this cavity is the now solvent-exposed, central portion of HR1. For maries, source data, extended data, supplementary information,
membrane fusion to occur—in comparison with other class-I fusion acknowledgements, peer review information; details of author con-
proteins and as described in coronavirus post-fusion structures22,28,29— tributions and competing interests; and statements of data and code
the S2 component is likely to undergo a major helical rearrangement, availability are available at https://doi.org/10.1038/s41586-020-2772-0.
in which the long trimer interface helix (spanning residues 990–1035)
grows and extends, by incorporating the refolded turn and helix from 1. Zhou, P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat
origin. Nature 579, 270–273 (2020).
the N-terminal portion of HR1, and projects the fusion peptide towards 2. Wan, Y., Shang, J., Graham, R., Baric, R. S. & Li, F. Receptor recognition by the novel
the host cell membrane. In this process, opening up of all three S1 mono- coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS
mers and their subsequent dissociation would enable the concerted coronavirus. J. Virol. 94, e00127-20 (2020).
3. Li, F., Li, W., Farzan, M. & Harrison, S. C. Structure of SARS coronavirus spike
helical refolding, as the cooperative displacement of the capping receptor-binding domain complexed with receptor. Science 309, 1864–1868 (2005).
portions of the protein will probably be required for the extension of 4. Li, W. et al. Angiotensin-converting enzyme 2 is a functional receptor for the SARS
the helical coil, as has recently been observed for the haemagglutinin coronavirus. Nature 426, 450–454 (2003).
5. Li, F. Structure, function, and evolution of coronavirus spike proteins. Annu. Rev. Virol. 3,
protein of influenza30. The stoichiometry of S1 subunit–ACE2 interac- 237–261 (2016).
tions required for effective cell-surface contact or for priming is not 6. Walls, A. C. et al. Structure, function, and antigenicity of the SARS-CoV-2 spike
addressed by our experiments. However, as the affinity of individual glycoprotein. Cell 181, 281–292 (2020).
7. Shang, J. et al. Structural basis of receptor recognition by SARS-CoV-2. Nature 581,
monomers for ACE2 appears to be sufficient for cellular association, it 221–224 (2020).
may be that more than one subunit is required to be in the open form 8. Belouzard, S., Chu, V. C. & Whittaker, G. R. Activation of the SARS coronavirus spike
for efficient priming of these rearrangements in S2 that occur in the protein via sequential proteolytic cleavage at two distinct sites. Proc. Natl Acad. Sci. USA
106, 5871–5876 (2009).
process of membrane fusion. It seems reasonable to propose that the 9. Millet, J. K. & Whittaker, G. R. Host cell entry of Middle East respiratory syndrome
likelihood of triggering the fusion conformation increases with the coronavirus after two-step, furin-mediated activation of the spike protein. Proc. Natl
number of ACE2 receptors bound. Acad. Sci. USA 111, 15214–15219 (2014).
10. Hoffmann, M. et al. SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked
In addition to the range of species of trimeric spike described above, by a clinically proven protease inhibitor. Cell 181, 271–280 (2020).
the largest single population of particles that we were able to identify 11. Lai, A. L., Millet, J. K., Daniel, S., Freed, J. H. & Whittaker, G. R. The SARS-CoV fusion
and reconstruct represent ACE2 bound to a S1 monomer (Fig. 4). The peptide forms an extended bipartite fusion platform that perturbs membrane order in a
calcium-dependent manner. J. Mol. Biol. 429, 3875–3892 (2017).
interaction between ACE2 and the RBD, and the interaction of the latter 12. Song, W., Gui, M., Wang, X. & Xiang, Y. Cryo-EM structure of the SARS coronavirus spike
with its associated intermediate subdomain, are very similar between glycoprotein in complex with its host cell receptor ACE2. PLoS Pathog. 14, e1007236 (2018).
the monomeric and trimer versions and with previously determined 13. Zhang, L. et al. The D614G mutation in the SARS-CoV-2 spike protein reduces S1
shedding and increases infectivity. Preprint at https://doi.org/10.1101/2020.06.12.148726
crystal and electron microscopy structures of ACE2 and RBD7,24,25. (2020).
However, there are increasingly large rearrangements between the 14. Hu, J. et al. D614G mutation of SARS-CoV-2 spike protein enhances viral infectivity.
two intermediate subdomains and then with the NTD. By applying Preprint at https://doi.org/10.1101/2020.06.20.161323 (2020).
15. Korber, B. et al. Tracking changes in SARS-CoV-2 spike: evidence that D614G increases
non-uniform refinement, the highest resolution was achieved for the infectivity of the COVID-19 virus. Cell 182, 812–827 (2020).
reconstruction of the ACE2–RBD interaction (Extended Data Fig. 4), 16. Wrapp, D. et al. Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.
Science 367, 1260–1263 (2020).
in part because of the tight interaction but also probably because of
17. Tortorici, M. A. et al. Structural basis for human coronavirus attachment to sialic acid
the dominant influence of this part of the structure on the alignment receptors. Nat. Struct. Mol. Biol. 26, 481–489 (2019).
process. Nevertheless, it is clear that there are both increasingly large 18. Yuan, Y. et al. Cryo-EM structures of MERS-CoV and SARS-CoV spike glycoproteins reveal
the dynamic receptor binding domains. Nat. Commun. 8, 15092 (2017).
changes in the interfaces between domains on moving towards the NTD
19. Kirchdoerfer, R. N. et al. Stabilized coronavirus spikes are resistant to conformational
and a range of subpopulations of related but variable conformations. changes induced by receptor recognition or proteolysis. Sci. Rep. 8, 15701 (2018).
The high proportion of ACE2–S1 monomers, and the limited contact 20. Pallesen, J. et al. Immunogenicity and structures of a rationally designed prefusion
MERS-CoV spike antigen. Proc. Natl Acad. Sci. USA 114, E7348–E7357 (2017).
areas between the trimeric S1 ring interactions with S2, suggest that
21. Walls, A. C. et al. Cryo-electron microscopy structure of a coronavirus spike glycoprotein
the fully open ACE2-bound spike complex is probably metastable. trimer. Nature 531, 114–117 (2016).
Taken together, our structural data enable mechanistic suggestions 22. Cai, Y. et al. Distinct conformational states of SARS-CoV-2 spike protein. Science 369,
1586–1592 (2020).
for the early stages of SARS-CoV-2 infection of cells. The SARS-CoV-2
23. Gui, M. et al. Cryo-electron microscopy structures of the SARS-CoV spike glycoprotein
spike protein is produced in a compact closed form in which the reveal a prerequisite conformational state for receptor binding. Cell Res. 27, 119–129
helices in the S2 membrane fusion component are capped by the (2017).
24. Lan, J. et al. Structure of the SARS-CoV-2 spike receptor-binding domain bound to the
RBD of neighbouring monomers. After cleavage by furin between ACE2 receptor. Nature 581, 215–220 (2020).
the S1 and S2 domains, the proportion of the spike trimers that is 25. Yan, R. et al. Structural basis for the recognition of SARS-CoV-2 by full-length human
able to accommodate RBD in an open, ACE2-binding conformation ACE2. Science 367, 1444–1448 (2020).
26. Wrobel, A. G. et al. SARS-CoV-2 and bat RaTG13 spike glycoprotein structures inform on
increases26. Binding of the ACE2 receptor to an open RBD leads to virus evolution and furin-cleavage effects. Nat. Struct. Mol. Biol. 27, 763–767 (2020).
a more-open trimer conformation. The geometry of ACE2 binding 27. Barnes, C. O. et al. Structures of human antibodies bound to SARS-CoV-2 spike reveal
is incompatible with the RBD adopting a closed conformation and common epitopes and recurrent features of antibodies. Cell 182, 828–842 (2020).
28. Walls, A. C. et al. Tectonic conformational changes of a coronavirus spike glycoprotein
leads to our observation of several two-open-RBD conformations as promote membrane fusion. Proc. Natl Acad. Sci. USA 114, 11157–11162 (2017).
well as the three-RBD-bound conformation. Successive RBD open- 29. Fan, X., Cao, D., Kong, L. & Zhang, X. Cryo-EM analysis of the post-fusion structure of the
ing and ACE2 binding lead to a fully open and ACE2-bound form in SARS-CoV spike glycoprotein. Nat. Commun. 11, 3618 (2020).
30. Benton, D. J., Gamblin, S. J., Rosenthal, P. B. & Skehel, J. J. Structural transitions
which the trimeric S1 ring remains bound to the core S2 trimer by in influenza haemagglutinin at membrane fusion pH. Nature 583, 150–153 (2020).
limited contacts through the intermediate subdomains of S1. This
arrangement leaves the top of the S2 helices fully exposed. In the Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
process, the interaction of the closed form of S1 with a segment of
the S2 chain that precedes the putative fusion peptide region, in the © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods The other, lower resolution models were refined using NAMDINATOR41
and geometry minimization and validation in PHENIX (Extended Data
Constructs design, protein expression and purification Table 2). Measurements were made using Chimera42, CCP4MG43 and
The ectodomains of ACE2 (19–615) and stabilized, ‘2P’ mutant PISA44, with structures aligned on the large helix of S2 (residues 986–1032).
(K986P and V987P) of SARS-CoV-2 spike (residues 1–1208) with intact
furin-cleavage site were prepared as described in a recent study26. In Reporting summary
brief, the proteins were expressed in Expi293F cells (Gibco), collected Further information on research design is available in the Nature
twice after 3–4 and 6–7 days, and purified with affinity chromatography Research Reporting Summary linked to this paper.
(spike using CoNTA resin from TAKARA, ACE2 with Streptactin XT resin
from IBA Lifesciences), followed by gel filtration into a buffer containing
20 mM Tris pH 8.0 and 150 mM NaCl. As previously described26, the puri- Data availability
fied spike was then incubated for 5 h with exogenous furin (New England Maps and models have been deposited in the Electron Microscopy Data
Biolabs), after which the reaction was stopped by addition of EDTA. Bank (EMD) and the Protein Data Bank (PDB) with the following acces-
sion codes: EMD-11681 and PDB 7A91 (dissociated S1 domain bound to
Electron microscopy sample preparation and data collection ACE2 (non-uniform refinement)); EMD-11682 and PDB 7A92 (dissoci-
R2/2 200-mesh Quantifoil grids were glow-discharged for 30 s at ated S1 domain bound to ACE2 (unmasked refinement)); EMD-11683
25 mA to prepare them for freezing. The furin-treated SARS-CoV-2 and PDB 7A93 (SARS-CoV-2 spike with two RBDs erect); EMD-11684 and
spike was mixed with octyl glucoside as previously described26 and, PDB 7A94 (SARS-CoV-2 spike with one ACE2 bound); EMD-11685 and
45–60 s before ultimately plunge-freezing the grid, with concentrated PDB 7A95 (SARS-CoV-2 spike with one ACE2 bound and one RBD erect
ACE2 at a 1:2 final molar ratio of trimeric spike:ACE2, aiming to obtain in clockwise direction); EMD-11686 and PDB 7A96 (SARS-CoV-2 spike
a final concentration of spike of 0.5 mg ml−1 and octyl glucoside of with one ACE2 bound and one RBD erect in anticlockwise direction);
0.1%. Then, 4 μl of the obtained reaction mixture was applied on a grid EMD-11687 and PDB 7A97 (SARS-CoV-2 spike with two ACE2 bound);
pre-equilibrated at 4 °C in 100% humidity, blotted with filter paper for EMD-11688 and PDB 7A98 (SARS-CoV-2 spike with three ACE2 bound).
4–4.5 s using Vitrobot Mark III, and plunge-frozen in liquid ethane.
Data were collected using EPU software on a Titan Krios microscope 31. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for
improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).
operating at 300 kV. Micrographs were collected using a Gatan K2 detec- 32. Scheres, S. H. W. RELION: implementation of a Bayesian approach to cryo-EM structure
tor mounted on a Gatan GIF Quantum energy filter operating in zero-loss determination. J. Struct. Biol. 180, 519–530 (2012).
mode with a slit width of 20 eV. Exposures were 8 s, fractionated into 33. Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron
micrographs. J. Struct. Biol. 192, 216–221 (2015).
32 frames with an accumulated dose of 54.4 e−Å−2, with a calibrated pixel size 34. Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for
of 1.08 Å. Images were collected at a range of defoci between 1.5 and 3.0 μm. cryo-EM. Commun. Biol. 2, 218 (2019).
35. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid
unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
Electron microscopy data processing 36. Zivanov, J., Nakane, T. & Scheres, S. H. W. A Bayesian approach to beam-induced motion
Movies were aligned using MotionCor231 implemented in RELION32, fol- correction in cryo-EM single-particle analysis. IUCrJ 6, 5–17 (2019).
37. Cardone, G., Heymann, J. B. & Steven, A. C. One number does not fit all: mapping local
lowed by contrast transfer function (CTF) estimation using Ctffind433.
variations in resolution in cryo-EM reconstructions. J. Struct. Biol. 184, 226–236 (2013).
Particles were picked using crYOLO34 using a manually trained model. 38. Rosenthal, P. B. & Henderson, R. Optimal determination of particle orientation, absolute
Particles were subjected to multiple rounds of two-dimensional clas- hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333,
721–745 (2003).
sification using cryoSPARC35. Classes that displayed a clear secondary
39. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot.
structure were retained and split into subsets, which either resem- Acta Crystallogr. D 66, 486–501 (2010).
bled spike trimers or S1 monomers bound to ACE2. Initial models were 40. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular
structure solution. Acta Crystallogr. D 66, 213–221 (2010).
made using the ab initio reconstruction in cryoSPARC. Different spe-
41. Kidmose, R. T. et al. Namdinator - automatic molecular dynamics flexible fitting of structural
cies containing trimeric spike proteins were separated by extensive models into cryo-EM and crystallography experimental maps. IUCrJ 6, 526–531 (2019).
three-dimensional classification in RELION as shown in Extended Data 42. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and
analysis. J. Comput. Chem. 25, 1605–1612 (2004).
Fig. 3. Before the final refinement, particles corresponding to each
43. McNicholas, S., Potterton, E., Wilson, K. S. & Noble, M. E. M. Presenting your structures:
of these species were subjected to Bayesian polishing in RELION36 the CCP4mg molecular-graphics software. Acta Crystallogr. D 67, 386–394 (2011).
followed by homogeneous refinement in cryoSPARC coupled to CTF 44. Krissinel, E. & Henrick, K. Inference of macromolecular assemblies from crystalline state.
J. Mol. Biol. 372, 774–797 (2007).
refinement. The monomeric S1–ACE2 complex was classified as in
Extended Data Fig. 4a and refined using non-uniform refinement in
Acknowledgements We thank A. Nans of the Structural Biology Science Technology Platform
cryoSPARC coupled to CTF refinement. The final particles from the
for assistance with data collection, P. Walker and A. Purkiss of the Structural Biology Science
S1–ACE2 complex were subjected to an unmasked refinement in RELION Technology Platform and the Scientific Computing Science Technology Platform for
to better resolve less-ordered domains, with an overall lower global computational support, and L. Calder, P. Cherepanov, G. Kassiotis and S. Kjaer for discussions.
This work was funded by the Francis Crick Institute, which receives its core funding from
resolution (Extended Data Fig. 4b, c). Local resolution was estimated
Cancer Research UK (FC001078 and FC001143), the UK Medical Research Council (FC001078
using blocres37 implemented in cryoSPARC. Maps were locally filtered and FC001143) and the Wellcome Trust (FC001078 and FC001143). P.X. is also supported by
and globally sharpened38 in cryoSPARC (Extended Data Figs. 5, 6). the 100 Top Talents Program of Sun Yat-sen University, the Sanming Project of Medicine in
Shenzhen (SZSM201911003) and the Shenzhen Science and Technology Innovation
Committee (grant no. JCYJ20190809151611269).
Model building
The model for the monomeric S1–ACE2 complex was based on the previ- Author contributions D.J.B., A.G.W., P.X., C.R. and S.R.M. performed research, collected and
analysed data; D.J.B., A.G.W., P.B.R., J.J.S. and S.J.G. conceived and designed research and
ously determined crystal structure (PDB: 6M0J)24, with additional parts of
wrote the paper.
the RBD and intermediate domain taken from a previous structure of the
closed trimer (PDB: 6ZGE)26. Models of the trimer structures were built Competing interests The authors declare no competing interests.
using structures from our previous study26 for the closed trimer (PDB:
6ZGE) and the one-erect-RBD structure (PDB: 6ZGG). The RBD–ACE2 Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
parts of the model were built using the structure from the high resolu- 2772-0.
tion S1–ACE2 complex from this study. Models were manually adjusted Correspondence and requests for materials should be addressed to D.J.B., A.G.W. or S.J.G.
Peer review information Nature thanks Stephen Harrison and the other, anonymous,
using COOT39. The models of S1–ACE2 and the one-ACE2-bound closed reviewer(s) for their contribution to the peer review of this work.
structure were refined and validated using PHENIX real space refine40. Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Surface representation of obtained structures. The three monomers of S in each trimer are coloured in blue, rosy brown and gold with
ACE2 shown in green. Relative percentages of all trimeric S particles used to calculate electron microscopy maps are shown.
Extended Data Fig. 2 | Features of the obtained spike structures. a, Two structure (purple) with the one-ACE2-bound structure (orange). c, S1 domains
three-dimensional classes, obtained by further classification of the shown to highlight domain shifts of the RBD and RBD-associated intermediate
one-ACE2-bound closed state from Fig. 1, representative of the range of domain. d, Outwards movements of spike domains (excluding RBDs).
motion of the RBD with bound ACE2, tilting away from the trimer axis of the e, Comparison of RBD displacements of one-bound, two-bound and
spike trimer. The tilt of the RBD and ACE2 is indicated with a dashed line. three-bound RBDs after binding of ACE2 to the unbound open structure of the
b, Representative density of different obtained electron microscopy maps for spike protein (beige). These are compared to the RBD displacement after
residues 996–1030 of S2. Built model shown in pink, with EM density shown as a binding of the C105 Fab fragment 27, which binds at the ACE2 interface of the
mesh. c, d, Comparison of spike structures for the open one-erect-RBD RBD (PDB: 6XCM).
Article
Extended Data Fig. 3 | Cryo-electron microscopy data processing scheme. final maps shown at the bottom. The global resolution, final particle number
Classes of particles used to obtain the final spike trimer structures, unbound and percentage for each trimer species are shown at the bottom.
and in complex with ACE2, are surrounded by a box of the same colour as the
Extended Data Fig. 4 | Monomeric S1 bound to ACE2. a, Classification particles. Domains are coloured as follows: green, ACE2; yellow, NTD; rosy
scheme for the S1–ACE2 complex. b, c, Maps are shown of orthogonal views of brown, RBD; pink, RBD ganymede; blue, NTD ganymede; cream, disseminated
the non-uniform refinement (b) and unmasked refinement (c) of the final S1 density in b.
Article
Extended Data Fig. 5 | Fourier shell correlation graphs for each of the determined structures. FSC, Fourier shell correlation.
Extended Data Fig. 6 | Maps and models of determined structures. Top, orthogonal views of electron microscopy density (grey) and ribbon diagram
representation of the models. Bottom, electron microscopy maps coloured by local resolution shown below.
Article
Extended Data Table 1 | Buried interface surface area between monomers in different conformations
Different confirmations of unbound and ACE2-bound trimers were analysed. The interface area was calculated using PISA. In the open and ACE2-bound conformations, chain A is the one to
open first and to bind the receptor first, then B follows, if the second RBD changes the conformation. Chain B is the chain anticlockwise to A when looking down the symmetry axis with the
membrane-proximal part at the bottom. The unbound and three-ACE2-bound molecules are of C3 symmetry.
Extended Data Table 2 | Cryo-electron microscopy data collection, refinement and validation statistics
Donald Benton, Antoni Wrobel, Steven
Corresponding author(s): Gamblin
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection CryoEM data collected using Thermo Scientific EPU v2.7
Data analysis CryoEM data processed using following packages: RELION-3.1, cryoSPARC v2.14, CTFfind4 v.4.1.10, MotionCor2 v.1.2.6, crYOLO v1.4,
Coot v.0.9, PHENIX v.1.17, UCSF Chimera v.1.12, UCSF ChimeraX v.0.5, CCP4MG v2.10, PISA v1.52
Data
Maps and models have been deposited in the Electron Microscopy Data Bank, http://www.ebi.ac.uk/pdbe/emdb/ and the Protein Data Bank, https://
October 2018
www.ebi.ac.uk/pdbe/ with the following accession codes: EMD-11681 and PDB 7A91 (Dissociated S1 domain bound to ACE2 [Non-Uniform Refinement]);
EMD-11682 and PDB 7A92 (Dissociated S1 domain bound to ACE2 [Unmasked Refinement]); EMD-11683 and PDB 7A93 (SARS-CoV-2 S with 2 RBDs Erect);
EMD-11684 and PDB 7A94 (SARS-CoV-2 S with 1 ACE2 Bound); EMD-11685 and PDB 7A95 (SARS-CoV-2 S with 1 ACE2 Bound and 1 RBD Erect in Clockwise
Direction); EMD-11686 and PDB 7A96 (SARS-CoV-2 S with 1 ACE2 Bound and 1 RBD Erect in Anticlockwise Direction); EMD-11687 and PDB 7A97 (SARS-CoV-2 S with
2 ACE2 Bound); EMD-11688 and PDB 7A98 (SARS-CoV-2 S with 3 ACE2 Bound).
1

Sample size All cryoEM datasets consist of several thousand images. The number of images were sufficient to achieve the reported resolution, according
to the most commonly reported resolution measure in cryoEM described in Rosenthal and Henderson 2003, as cited in the manuscript
Data exclusions CryoEM single particles were included and excluded within the image processing workflow using standard image processing techniques such
as 2D and 3D classifications, as detailed in Extended Data Figures 3 and 4.
Replication Structures were determined using independent half datasets, according to standard procedures in cryoEM. Images were collected from three
independent replicate prepared grids, which all produced similar images both by low resolution visual inspection and high resolution class
averages. There were no unsuccessful replications.
Randomization Not applicable to this study, as samples were not assigned to experimental groups and data were collected and processed according to
standard techniques for cryoEM.
Blinding Not applicable to this study, as there was no experimental group allocation in data collection and analysis.


Antibodies ChIP-seq
Clinical data

Cell line source(s) expi293F cells were purchased from Thermo Scientific and used for protein expression
Authentication Cell line used was not authenticated
Mycoplasma contamination Cell line was not tested for mycoplasma contamination
Commonly misidentified lines Name any commonly misidentified cell lines used in the study and provide a rationale for their use.
October 2018
2
Article
A metastasis map of human cancer cell lines

https://doi.org/10.1038/s41586-020-2969-2 Xin Jin1 ✉, Zelalem Demere1, Karthik Nair1, Ahmed Ali1,2, Gino B. Ferraro3, Ted Natoli1,
Amy Deik1, Lia Petronio1, Andrew A. Tang1, Cong Zhu1, Li Wang1, Danny Rosenberg1,
Received: 17 December 2018
Vamsi Mangena4, Jennifer Roth1, Kwanghun Chung1,4, Rakesh K. Jain3,5, Clary B. Clish1,
Accepted: 26 August 2020 Matthew G. Vander Heiden1,2,6 & Todd R. Golub1,5,6 ✉
Check for updates Most deaths from cancer are explained by metastasis, and yet large-scale metastasis
research has been impractical owing to the complexity of in vivo models. Here we
introduce an in vivo barcoding strategy that is capable of determining the metastatic
potential of human cancer cell lines in mouse xenografts at scale. We validated the
robustness, scalability and reproducibility of the method and applied it to 500 cell
lines1,2 spanning 21 types of solid tumour. We created a first-generation metastasis
map (MetMap) that reveals organ-specific patterns of metastasis, enabling these
patterns to be associated with clinical and genomic features. We demonstrate the
utility of MetMap by investigating the molecular basis of breast cancers capable of
metastasizing to the brain—a principal cause of death in patients with this type of
cancer. Breast cancers capable of metastasizing to the brain showed evidence of
altered lipid metabolism. Perturbation of lipid metabolism in these cells curbed brain
metastasis development, suggesting a therapeutic strategy to combat the disease and
demonstrating the utility of MetMap as a resource to support metastasis research.
Human cancer cell lines have been a driving force in cancer research, as a pool into the left ventricle of 5–6-week-old NOD-SCID-gamma
leading to the discovery of oncogenic mechanisms and therapeutic (NSG) mice so as to focus our analysis on the ability of tumour cells
targets1–4. However, large-scale characterization of cell lines has been to exit circulation and undergo expansion in distant organs. Biolu-
limited to rudimentary readouts such as viability in cell culture, because minescence imaging (BLI) revealed metastatic lesions throughout
more complex phenotypes—such as behaviours in vivo—have not been the body (Extended Data Fig. 1b). Five weeks after injection, brain,
tractable at scale. By contrast, most studies of metastasis rely on only lung, liver, kidney and bone were collected, human tumour cells were
a small number of experimental models5–9, thereby making it difficult isolated by fluorescence-activated cell sorting (FACS) using GFP
to extrapolate findings to genetically diverse human tumours10. or mCherry, and barcodes were quantified using RNA sequencing
Ideally, it would be possible to construct a map of organ-specific (RNA-seq) (Extended Data Fig. 1c–g). Whereas barcode abundances
metastatic potential of hundreds of human cancer cell lines using were similar pre-injection, some barcodes were enriched in specific
xenograft models, so that the molecular features of the cell lines could organs (Extended Data Fig. 1h). Different cell lines exhibited distinct
be related to their ability to survive and proliferate in organ-specific patterns of metastatic spread, but each cell line showed highly similar
microenvironments. However, the prospect of in vivo testing of each pattern of spread across multiple mice independent of whether GFP
cell line individually is unattractive, because it is labour-intensive and or mCherry versions were used, demonstrating the reproducibility of
expensive, as well as because of the difficulty in sufficiently controlling this pooled approach (Extended Data Fig. 1d). For example, HCC1954
for variability between animal experiments. We proposed that if cell was most strongly detected in brain, whereas extracranial metastases
lines were labelled with molecular barcodes and injected into recipi- were dominated by MDAMB231. Barcodes quantified by bulk RNA-seq
ent mice as a pool, internally controlled, metastatic potential could be were independently validated by quantitative PCR with reverse tran-
assessed in a highly scalable manner. scription (RT–qPCR) and single-cell RNA-seq (Extended Data Fig. 1i–m,
Supplementary Note 1).
Having validated the method, we next characterized the metastatic
Pilot study with breast cancer behaviours of all 21 basal-like breast cancer cell lines in the Cancer Cell
To test the feasibility and reliability of in vivo barcoding to monitor Line Encyclopedia (CCLE) (Extended Data Fig. 1a–d). Basal-like breast
growth in different tissues in mice, we performed a pilot study using cancers are known to have diverse metastatic abilities in patients11.
four breast cancer cell lines (Fig. 1a, Extended Data Fig. 1, Supple- Reflecting this diversity, the cell lines showed disparate metastatic pat-
mentary Note 1). Each cell line was engineered to express a unique terns: pan-metastatic, metastatic preferentially to particular organs or
26-nucleotide barcode, together with luciferase for in vivo imaging not metastatic (Fig. 1b, Supplementary Table 2). Notably, one cell line
and either GFP or mCherry to facilitate subsequent cell sorting and (BT20) was detected in multiple organs, but at very low abundance in all
measurement of reproducibility within a single mouse (Extended Data of them, reflecting its ability to colonize but not expand. To validate the
Fig. 1a, Supplementary Table 1). The 8 barcoded lines were injected patterns of metastasis observed in the pooled in vivo system, we selected
1
Broad Institute of MIT and Harvard, Cambridge, MA, USA. 2Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
3
Edwin L. Steele Laboratories, Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA, USA. 4Institute for Medical Engineering and Science, Picower Institute for
Learning and Memory, Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA. 5Harvard Medical School, Boston, MA, USA. 6Dana-Farber Cancer
Institute, Boston, MA, USA. ✉e-mail: xjin@broadinstitute.org; golub@broadinstitute.org

Article
a Barcoded cell Metastasis Isolate cells such competition. A similarly reduced diversity was observed in the
line pool formation from organs RNA-seq
Brain Tissue
Barcode
transcripts
Cancer cell
composition
Inferred cell
number
orthotopic setting, where injection of a pool of nine breast cancer cell
dissociation
Pilot Lung lines into the mammary fat pad resulted in a single cell line dominating
Group 1 Liver Mouse cell Cancer In vivo
Kidney
depletion transcripts expression
Metastatic
the resulting tumour (Extended Data Fig. 3d).
Bone
Group 2 FACS potential
To determine whether the MetMap reflects the metastatic behaviour
b MDAMB231 HCC1187 JIMT1 HCC1806 HMC18 HCC70 of human cancers, we analysed available clinical annotations of the cell
lines (Fig. 3a–e, Extended Data Fig. 4). We found statistically significant
associations with tumour lineage, the site from which the cell line was
derived (primary tumour versus metastatic lesion) and patient age.
HCC1954 MDAMB468 BT20 MDAMB436 HCC1599 DU4475
There was no association between metastatic potential and gender
or ethnicity. As expected, metastatic potential was higher in certain
tumour types, such as melanoma and pancreatic cancer, which also
HCC1395 MDAMB157 HCC1569 HCC38 HDQP1 HCC1937 tend to develop metastasis in the human disease setting13. By contrast,
cell lines derived from brain tumours were generally non-metastatic,
reflective of their tendency to not undergo haematogenous spread14,15.
BT549 CAL851 HCC1143 Petal plot Similarly, the DU145 prostate cancer cell line, derived from a brain
Brain
105 metastasis lesion16, exhibited brain metastasis in our experimental
ne
Lun
Bo
100 system. Cell lines derived from metastases showed higher metastatic
g
1
Ki
dn
ey Liv
e r 0 potential than lines derived from primary tumours, although lines
Organ Metastatic potential Penetrance derived from primary tumours known to later give rise to metastases
Fig. 1 | Scalable in vivo metastatic potential mapping with barcoded cell
in patients were metastatic in the MetMap (Fig. 3b), consistent with
line pools. a, A schematic of the experiment determining the feasibility of previously reported suggestions that metastatic potential is already
in vivo metastatic potential profiling using barcoded cell line pools. Barcode encoded in primary tumours17–19. The association between decreased
abundance reflecting cancer cell compositions was determined by RNA-seq, metastatic potential and increased patient age was unexpected (Fig. 3c),
and the cell number of each cell line was inferred by cancer cell composition and its basis remains to be determined.
and total cancer cell counts isolated from the target organ. b, Petal plots Perhaps most importantly, extensive variation in metastatic potential
displaying the metastatic patterns of 21 basal-like breast cancer cell lines. Petal was observed within individual lineages, making it possible to search
length represents metastatic potential, quantifying the mean of inferred for associations between metastasis propensity and genomic features
cancer cell numbers detected from the target organs. Data are mean ± 95% of the tumours. Of note, metastatic potential was not simply explained
confidence interval. Petal width shows penetrance, quantifying percentage of by proliferation rate or mutational burden (Fig. 3f–h, Extended Data
mice detected with the cell line.
Fig. 4f, g), suggesting that more subtle molecular determinants of
metastasis were involved.
eight cell lines for individual characterization, and observed similar
results from the pooled and individual screens (Extended Data Fig. 1n, o).
Molecular correlates of brain metastasis
To develop mechanistic insights, we focused on breast cancer and its
A metastasis map of 500 human cancer cell lines potential for brain metastasis (Fig. 1b), because brain metastasis is a
Having demonstrated its feasibility in breast cancer, we attempted feature of some—but not all—breast cancers, and little is known about
to expand the mapping of metastatic potential to human cancer cell the underlying factors that could inform therapeutic approaches20,21.
lines from diverse lineages. To facilitate higher-throughput profil- We therefore undertook a systematic and unbiased comparison of
ing, we used cell lines barcoded for use with the PRISM method, which the molecular features that distinguished brain metastatic versus
was developed for in vitro drug-sensitivity screening12. A simplified non-metastatic lines, using genomic data available for each of the
workflow enabled the quantitative detection of barcodes from crude cell lines.
tissue lysates without the need for FACS-based tumour cell purification At the level of somatic mutations, PIK3CA was the top associated cor-
(Extended Data Fig. 2, Supplementary Note 2). We applied this method relate: 4 out of 7 brain metastatic lines contained a PIK3CA mutation,
to 503 cell lines spanning 21 lineages to develop a first-generation compared with 0 out of 14 non-metastatic or weakly metastatic lines
Metastasis Map (MetMap) (Fig. 2a). The data and interactive visualiza- (false discovery rate (FDR) = 0.0034) (Fig. 4a, Extended Data Fig. 5a).
tion are publicly accessible at https://pubs.broadinstitute.org/metmap. A fifth line, HCC70, has a loss-of-function mutation in PTEN. PI3K is a prin-
To test the robustness of the MetMap dataset, we tested cell lines in cipal downstream mediator of ERBB2 (also known as HER2), which itself
two formats: in one, we injected all 498 cell lines as a single pool; in the has been reported to be associated with brain metastasis in humans11,20.
other, we injected 5 pools of 25 lines, with each pool being injected into Indeed, two of the brain metastatic cell lines ( JIMT1 and HCC1954) also
different mice (referred to as MetMap500 and MetMap125, respectively) contain typical ERBB2 gene amplifications (Extended Data Fig. 5a).
(Fig. 2b). We similarly varied cell numbers, mouse age and cohort size to At the level of DNA copy number, we observed an association between
determine whether results varied substantially with these parameters. metastatic potential and deletions of chromosome 8p12–8p21.2
We observed strong correlation of the metastatic potential despite (referred to as 8p) (FDR = 0.0017) (Fig. 4b). Five out of seven brain
differences in experimental conditions (Fig. 2c), suggesting that the metastatic breast cancer cell lines contained deletions in this region,
approach is extremely robust. We also note that intracardiac injec- compared with 0 out of 14 non-metastatic lines (Extended Data Fig. 5a).
tion enabled the evaluation of many more cell lines in vivo compared A sixth metastatic line, JIMT1, has a small deletion within this commonly
with subcutaneous injection. Specifically, we recovered an average of deleted region.
197 cell lines per mouse following intracardiac injection, whereas an To ascertain the clinical relevance of these associations, we analysed
average of 42 cell lines were recovered following subcutaneous injection clinical breast cancer datasets for which metastasis information was
(Extended Data Fig. 3a–c). We suspect that this difference is explained available18. We observed a strong correlation between 8p copy number
by the local competition for nutrients and other microenvironmental and gene expression in the METABRIC and TCGA datasets22,23 (Extended
factors in the subcutaneous setting, whereas the spatial separation Data Fig. 6a), thereby validating 8p expression as a surrogate for copy
of tumour cells delivered through the intracardiac route minimizes number in datasets for which copy number data were not available.

a PRISM cell Metastasis Mouse stroma depleted Barcode b
line pool formation tissue lysates quantification MetMap500 MetMap125
Brain
Lung Relative
metastatic
Liver
potential
Kidney
Bone
498 lines in 1 pool 25 lines × 5 pools
Neuroblastoma
Head and neck
Mesothelioma
Endometrium
~500 cells per line ~10k cells per line
Oesophageal
Pancreatic
15 mice ~5 mice per pool
Melanoma
Colorectal
Sarcoma
Bile duct
Prostate
Bladder
Ovarian
8–10 weeks old, female 5–6 weeks old, female
Thyroid
Gastric
Kidney
Breast
Bone
Brain
Lung
5 organs 4 organs
Liver
120 cell lines
Primary 4 organs
shared
tumour
Metastasis
c Overall metastatic potential - 4 organs combined
Pearson’s r = 0.80 Pearson’s r = 0.63

ES2 P = 9.8 × 10–15
P < 2.2 × 10–16
Brain
A2780
HEYA8
102
SKHEP1
Penetrance HCC1806 DMS273
0% LU99
20% A673
SNU407 MELHO
40%
SW620
KP4 Pearson’s r = 0.75
60% HSC3
MELJUSO
80% P < 2.2 × 10–16
SKMEL5 A549
100% SNU869
8505C
ASPC1
Lung
G361 MHHES1
CAL62 UACC62
100 LN229
CORL23
HS766T SW480
YAPC OVCAR8
HMC18 S117 KYSE510 NCIH841
MetMap500
JIMT1 YD15
SKMEL24 HEC1A
DU145 MDAMB435S
YKG1 HARA
RKO
SNU840 HS294T PC14 Pearson’s r = 0.84
RMUGS MESSA P < 2.2 × 10–16
NCIH322 RERFLCAD2
HT144 SNU1041
KYSE30 KYSE410 SNUC2A
T24 NCIH1437
Liver
AN3CA HCC827
SH4 SCABER NCIH1703
NCIH2172 AGS L33
10−2 647V NCIH1975
NCIH2030
ISTMES1ISHIKAWAHERAKLIO02ER
J82 NCIH1355 YD10B 22RV1
G401 HLF
GB1 LS411N MIAPACA2
KURAMOCHI Pearson’s r = 0.60
U251MG CAL12T CAPAN2
SF295
KYSE70
OE19 P = 5.1 × 10–13
CAL78 SNU719
YH13 NCIH1339 NCIN87
SNU761
G292CLONEA141B1 UBLC1NCIH1623 KNS62 SW837
CAL54 SNU601 PANC0327 U2OS
Bone
OAW42 NCIH2052 T98G COV362

SNU46 GI1
JHH4 5637 PLCPRF5 ONCODG1 HGC27 WM793
NCIH1793 PANC1005 KALS1 CAL29
10−4 UO31 SW948 GAMG
769P
10−4 10−2 100 102

MetMap125
Fig. 2 | Drafting MetMap for 500 human cancer cell lines. a, A schematic of primary tumour or metastasis. b, Comparison of experimental conditions
the workflow using pan-cancer PRISM cell line pools for high-throughput between MetMap500 and MetMap125. c, Scatter plots showing overall and
metastatic potential profiling. Relative metastatic potential was quantified by organ-specific metastatic potential as determined in MetMap500 and
deep sequencing of PRISM barcode abundance from tissue. The cancer lineage MetMap125. Strong correlation is observed between the two experiments.
distribution of the profiled 500 cancer cell lines is presented, with each dot Each dot represents a cell line. Cancer lineage is colour-coded as in a.
representing a cell line, and showing whether the cell line was derived from
Coordinated expression of 8p genes stratified tumours into two clus-

ters, with the low-expressing cluster showing enrichment in brain Lipid metabolism and brain metastasis
metastasis and lower brain metastasis-free survival (Extended Data Confirming these genetic findings, expression analysis revealed enrich-
Fig. 6b). Whereas 8p loss was more frequent in basal-subtype breast ment of PI3K and ERBB2 signatures in the brain metastatic cell lines
cancer (known to have poor prognosis), 8p loss remained significantly (Fig. 4c). Furthermore, we observed a strong association between
associated with brain metastases within basal tumours. A similar trend brain metastatic potential and a lipid-synthesis signature (Fig. 4c),
was seen in other subtypes, but the sample size was too small to reach which has been reported in association with both PI3K activation and
statistical significance. Concordant with these findings, the 8p-low 8p-deletion27,28. To investigate a potential role of lipid metabolism in
signature was strongly enriched in brain metastasis lesions compared breast-cancer brain metastatic potential, we analysed the abundance
with extracranial metastases or primary tumours24 (Extended Data of lipid metabolites across the cell lines29. We observed increased
Fig. 6c, d). Similarly, we observed that response signatures25,26 indicat- levels of cholesterol species in highly brain metastatic cells (Fig. 4d).
ing PI3K activation are associated with brain metastases (Extended In addition to cholesterols, membrane lipids including phosphatidyl-
Data Fig. 6e–g). The PI3K-high signature tended to co-occur with the choline and sphingomyelin were similarly more abundant, as were
8p-low signature, and the overlapping events captured the majority metabolites associated with the pentose phosphate pathway30, which
of patients with brain metastases (Extended Data Fig. 6h, i). These can support cholesterol and lipid synthesis. By contrast, we observed
results established the validity of the MetMap experimental system global decreases in levels of triacylglycerols (TAGs) in brain metastatic
for discovery. cells (Fig. 4d). Non-brain metastatic cells had higher levels of TAGs and

Article
a Cancer type: P = 2.2 × 10–7 To investigate the role of SREBF1 in affecting the lipid phenotype
102 observed in brain metastatic cells, we performed lipidomics after
Metastatic
knocking out SREBF1 in JIMT1 and HCC1806 cells using CRISPR–Cas9.

potential
100
10–2 SREBF1 knockout resulted in a marked shift in intracellular lipid con-

10–4 tent, including a decrease in levels of cholesterol, membrane lipids and
b Derived from: P = 0.00028 P = 5.5 × 10–5 diacylglycerols (Fig. 4h). SREBF1 knockout also resulted in an increase
102 102
P = 0.055 P = 0.28
in intracellular TAG levels, presumably by scavenging TAGs from the
lipid-rich serum added to the culture medium. To test this hypoth-
Metastatic
potential
100 100
10–2 10–2 esis, we repeated the experiment in culture medium prepared with
10–4 10–4
delipidated serum, which prevented the increase in TAGs observed in
Primary tumour Metastasis Primary tumour Primary with
metastasis
Metastasis
SREBF1-knockout cells (Extended Data Fig. 7).
c Age: P = 0.0077
102
To further explore the role of SREBF1, we performed RNA-seq fol-
lowing SREBF1 knockout and found SCD35 to be the most consistently
Metastatic
potential
100
10–2
downregulated gene (Fig. 4i). Consistent with this, SCD was the top
10–4
co-dependency of SREBF1 across 688 cell lines in the genome-wide
0–10 10–20 20–30 30–40 40–50 50–60 60–70 70–80 80–90 CRISPR–Cas9 viability screens (Fig. 4j). The next highest scoring SREBF1
d Gender: P = 0.55 e Ethnicity: P = 0.91
co-dependency was SCAP, which encodes the upstream activator of
102 102
SREBF135. Comparison of gene expression in breast cancer cells grown
Metastatic
potential
100 100
in vitro or in the brain similarly showed that in the brain, cells adopted
10–2 10–2
gene-expression signatures of adipogenesis, fatty acid metabolism
10–4 10–4
Male Female Asian African american Caucasian and xenobiotic metabolism (Extended Data Fig. 8, Supplementary
f Doubling time (h): P = 0.058 g Mutation burden: P = 0.52 h Aneuploidy: P = 0.23
Note 3). The enrichment of lipid-metabolism signatures (including
102 102 102
upregulation of SREBF1 and SCD) was unique to brain compared with
other sites of metastasis. Similar upregulation was also observed in
Metastatic
potential
100 100 100
10-2 10-2 10-2

brain metastases from patients compared with extracranial metastases
or their matched primary tumours36 (Extended Data Fig. 9). Further-
10-4 10-4 10-4
0 100 200 300 0 1000 2,000 0 20 40 more, the requirement for SREBF1, SCD, SCAP and other members of
the lipid-metabolism pathway for brain metastasis formation was con-
Fig. 3 | Clinical correlates of metastatic potential. a–e, Single-variate
firmed in both mini-pool and individual gene-knockout experiments
correlation of different clinical parameters with overall metastatic potential
(Fig. 5a–c, Supplementary Note 4). Together, these genetic, metabolic,
from MetMap500 data. Primary with metastasis indicates that the cell line was
transcriptomic and functional genomic evidence all point to an associa-
derived from the primary tumour and the donor developed metastasis at
diagnosis or later. In box plots, boxes display quartiles of the data; outlier
tion between SREBF1-mediated lipid metabolism and brain metastasis.
points extend beyond 1.5× interquartile ranges from either hinge. Cancer Given the observation that SREBF1 knockout resulted in a viability
lineage is colour-coded as in Fig. 2a. f–h, Single-variate correlation of cell defect in vitro (Extended Data Fig. 10a), we compared the relative effect
doubling, mutation burden and aneuploidy status with overall metastatic of knockout on metastasis to different organs, to determine whether the
potential from MetMap500 data. f, Doubling time in hours. g, Mutation burden viability defect was preferentially observed in brain (Fig. 5d). Five weeks
quantified by somatic mutations from exon-sequencing data. h, Aneuploidy following intracardiac injection of SREBF1-knockout cells, we observed a
quantified by chromosome-arm-level events from exon-sequencing data. Each marked defect in brain metastasis (196-fold reduction), compared with a
dot represents a cell line. modest defect in other organs (9–21 fold) (Fig. 5d). Histologic examina-
tion of brains from xenografted mice revealed large metastatic lesions in
mice receiving wild-type cells, whereas those receiving SREBF1-knockout
contained a fatty acid oxidation signature (Fig. 4c). Metabolite profiling cells contained micrometastases (Extended Data Fig. 10b), suggesting
of normal mouse tissues31 showed that the brain has markedly lower that SREBF1 is not required for seeding the brain, but rather for prolifera-
levels of TAGs compared with other tissues (Fig. 4e). This reflects brain tion in the brain microenvironment. Consistent with this hypothesis,
physiology, whereby instead of storing fatty acids as TAGs, the brain injection of tumour cells into the carotid artery increased the probability
accumulates specialized lipids to support neural activity and brain of seeding the brain, but nevertheless a marked growth defect was still
function32. One possibility is that for breast cancer cells to survive in the observed in SREBF1-knockout cells (Fig. 5e).
brain microenvironment, where TAGs and other storage lipids present To determine the generality of the SREBF1 requirement for breast
in other tissues are not abundant, they must access lipids via de novo cancer growth in the brain, we knocked out SREBF1 in additional
synthesis or another route, in line with the seed-and-soil hypothesis33. brain metastatic lines including HCC1954, MDAMB231 and HCC1806
To further investigate the characteristics of breast cancer cell lines using CRISPR–Cas9. As with JIMT1, a significant inhibition in brain
capable of brain metastasis, we analysed genome-wide CRISPR–Cas9 metastatic growth was also observed in these lines, although the
viability-screening data34 to identify gene vulnerabilities associated with magnitude and duration of growth inhibition varied (Extended Data
the brain metastatic state. We identified SREBF1 as the top-correlated Fig. 10c, d). The least responsive cell line was HCC1806, in which
dependency with brain metastasis (FDR = 0.001) (Fig. 4f). SREBF1 is a SREBF1-knockout cells displayed a brain growth defect for the first
pivotal transcription factor that mediates lipid synthesis downstream week, but then assumed a growth trajectory that paralleled wild-type
of the PI3K pathway27,35. SREBF1 was selectively required for growth of cells. This restoration of growth was not explained by reversion of the
brain metastatic lines in culture compared with breast cancer lines with genome editing, as brain metastases at the end of the experiment
low or no brain metastatic potential. The association was specific to showed evidence of editing at the SREBF1 locus and minimal SREBF1
brain, as no association was observed between SREBF1 essentiality and protein expression (Extended Data Fig. 10e, f). Instead, we found that
metastasis to other organs (Fig. 4f). This SREBF1–breast-cancer brain the SREBF1-independent growth was associated with upregulation
metastasis association was also recovered in the MetMap500 data- of the fatty acid transporter CD36 and the fatty acid-binding protein
set, indicating strong reproducibility of the finding (Extended Data FABP6 (Extended Data Fig. 10g). Of note, culture of HCC1806 in mouse
Fig. 5b, c). Of note, the SREBF1 paralogue SREBF2 showed no association brain-slice-conditioned medium similarly resulted in upregulation of
between its essentiality in culture and metastatic potential (Fig. 4g). SCD and CD36 expression (Extended Data Fig. 10h, i). JIMT1 cells did

a Mutation b Copy number alteration c Expression signature
6 PIK3CA 8 Enriched 10-5
ADAM28-WRN (chr8p) Burton: adipogenesis peak at 8 h
6
−log10(P value)
Hallmark: PI3K–AKT–MTOR signaling
−log10(P value)
4
GO: ERBB signalling pathway
4
GO: ERBB2 signalling pathway
2 GO: carnitine metabolic process
KRAS 2
TP53 Reactome: mitochondrial fatty acid β-oxidation
BRCA1 BRAF
GO: short chain fatty acid metabolic process
0 BRCA2 0
10–5 Depleted
0 1,000 2,000 0 5,000 10,000
Gene ranks Gene ranks
st
t
rg tes
Sk inte
hi at
t
fa
La l in
us
Sp o c
W nf
Ki al
en
ey
us
Th s
te
e
ym
G t
tr
rta
e
n
ow
al
ng
Ad r
r
re
d e
dn
le
le
in
st
ve
ea
as
ai
Metabolite
Sm
Ao
So
Lu
Br
Br
Te
Li
H
Enriched 10–3 log2(FC) CE
PPP PPP DAG
LPC
CE CE LPE
PC PC PC
Lipid species
PE
SM SM SM
LPC LPC
LPE LPE
TAG
DAG DAG
TAG TAG
10–3 Depleted Abundance z-score –2 2
Brain met Non/weakly
f CRISPR dependency Brain Lung Liver g metastatic
SREBF1 HCC1806
104 JIMT1 HCC1954 SREBF1
CRISPR-dependency
Metastatic potential
6
MDAMB231
103
−log10(P value)
4
102 Kidney Bone
2 SREBF2
101
0 100
0 5,000 10,000 15,000 –3 –2 –1 0 SREBF1 gene effect –3 –2 –1 0 1
Gene ranks Gene effects
h Lipidomics: SREBF1-KO vs WT i RNA-seq: SREBF1-KO vs WT j SREBF1 co-dependency
Enriched 10–3 6
SREBF1 SCD
TAG TAG SCD
Ceramide Ceramide 75
−log10(P value)
MAG MAG SCAP

LPE LPE 4 −log10(P value)
PI PI 50
SM SM
PE PE
CE 2
CE 25
DAG DAG
LPC LPC
PC PC 0
0
10–3 Depleted log2(FC) –3 –2 –1 0 1 0 5,000 10,000 15,000
log2(FC) Gene ranks
Fig. 4 | An altered lipid-metabolism state associates with brain metastatic diacylglycerol; PPP, pentose phosphate pathway metabolites. e, Heat map
potential in basal-like breast cancer. a, Somatic mutations that associate presenting distribution of lipid species measured by mass spectrometry from
with brain metastatic potential in the basal-like breast cancer cohort. The top different mouse tissues. Gastroc, gastrocnemius. f, CRISPR gene dependencies
correlate, PIK3CA, reaches statistical significance (FDR = 0.0034, highlighted that associate with brain metastatic potential. The top gene, SREBF1
in bold). All PIK3CA mutations are activating. Positive correlations are in red, (FDR = 0.001), is a selective dependency in highly brain metastatic lines.
negative correlations are in blue. Selected known oncogenes or tumour Positive correlations are in red, negative correlations are in blue.
suppressors in basal-like breast cancer are presented for comparison. g, Distribution of SREBF1 (top) and SREBF2 (bottom) dependencies across 688
b, Alterations in copy number that associate with brain metastatic potential. human cancer cell lines. The positions of highly brain metastatic (met) breast
The top correlates cluster in chr 8p12–8p21.2 (FDR = 0.0017, highlighted in lines are highlighted in red, whereas weakly metastatic or non-brain metastatic
bold). c, Gene-expression signatures that associate with brain metastatic breast lines are highlighted in blue. h, Consensus alterations in lipid species
potential. Bars indicate P values. Expression signature scores were projected abundance upon SREBF1 knockout (KO) in JIMT1 and HCC1806, two brain
for each cell line with their in vitro RNA-seq data and used for regression metastatic cell lines. Bars indicate adjusted P values. Lipid metabolites
analysis. GO (Gene Ontology), Hallmark, Reactome and Burton are gene sets in measured by mass spectrometry were grouped by species, and enrichment
the MSigDB gene set enrichment analysis (GSEA) collection. d, Lipid-metabolite analysis of the species was performed using GSEA. WT, wild type. i, Consensus
species that associate with brain metastatic potential. Bars indicate P values. gene-expression changes upon SREBF1 knockout in JIMT1, HCC1806, HCC1954
Lipid metabolites measured by mass spectrometry were grouped by species, and MDAMB231, four brain metastatic cell lines. The two top genes are SREBF1
and enrichment analysis of the species was performed using GSEA. CE, and SCD (FDR <0.05, highlighted in bold). j, Co-dependencies of SREBF1 across
cholesterol ester; PC, phosphatidylcholine; SM, sphingomyelin; LPC, 688 human cancer cell lines in genome-wide CRISPR viability screen. The two
lysophosphatidylcholine; LPE, lysophosphatidylethanolamine; DAG, top genes are SCD and SCAP (FDR < 1 × 10 −60, highlighted in bold).
not upregulate CD36 or FABP6 expression following SREBF1 knockout

(Extended Data Fig. 10g), perhaps explaining their inability to survive Discussion
in the brain. Together, these results further demonstrate the relation- This work describes MetMap as an approach for large-scale in vivo char-
ship between lipid metabolism and brain metastasis, as cells under the acterization of human cancer cell lines. The MetMap resource (available
selective pressure of SREBF1 loss must acquire lipids by other means at https://pubs.broadinstitute.org/metmap) currently includes metas-
to survive in the brain microenvironment. tasis profiles of 500 cell lines spanning 21 tumour types, providing a

Article
a JIMT1-Cas9 Arrayed guide infection Pooling b PMVK 2. Ghandi, M. et al. Next-generation characterization of the Cancer Cell Line Encyclopedia.
... 8 Nature 569, 503–508 (2019).
3. Garnett, M. J. et al. Systematic identification of genomic markers of drug sensitivity in
–log10 (adj. P)
...
...
... 6
cancer cells. Nature 483, 570–575 (2012).
UBIAD1 IRX3 4 4. Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens.
SCD SCAP
ACLY 2 Nature 568, 511–516 (2019).
Guide dropout Guide amplification Intracranial
SREBF1 5. Kang, Y. et al. A multigenic program mediating breast cancer metastasis to bone. Cancer
quantification from brain lesion injection
–5 –2.5 0
Cell 3, 537–549 (2003).
c JIMT1 intracranial Effect size 6. Chen, S. et al. Genome-wide CRISPR screen in a mouse model of tumor growth and
metastasis. Cell 160, 1246–1260 (2015).
107 WT SREBF1-KO SCAP-KO SCD-KO ACLY-KO PMVK-KO IRX3-KO
7. Malladi, S. et al. Metastatic latency and immune evasion through autocrine inhibition of
WNT. Cell 165, 45–60 (2016).
BLI
104
8. van der Weyden, L. et al. Genome-wide in vivo screen identifies novel host regulators of
g1 g2 g1 g2 g1 g2 g1 g2 g1 g2 g1 g2 g1 g2
107 metastatic colonization. Nature 541, 233–236 (2017).
106 9. Tasdogan, A. et al. Metabolic heterogeneity confers differences in melanoma metastatic
105 potential. Nature 577, 115–120 (2020).
104 10. Robinson, D. R. et al. Integrative clinical genomics of metastatic cancer. Nature 548,
103 297–303 (2017).
0 10 20 30 40 11. Kennecke, H. et al. Metastatic behavior of breast cancer subtypes. J. Clin. Oncol. 28,
Days post injection
3271–3277 (2010).
d JIMT1 intracardiac e JIMT1 intracarotid
Brain Lung Liver Bone Whole body
12. Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in
mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
13. Budczies, J. et al. The landscape of metastatic progression patterns across major human
BLI
BLI
cancers. Oncotarget 6, 570–583 (2015).

14. Müller, C. et al. Hematogenous dissemination of glioblastoma multiforme. Sci. Transl.
105 107 105 107 106 108 106 108 106 108 105 107
Med. 6, 247ra101 (2014).
101 101
BLI fold change
196-fold 10-fold 21-fold 9-fold 9-fold 111-fold 15. Fonkem, E., Lun, M. & Wong, E. T. Rare phenomenon of extracranial metastasis of
100 100
glioblastoma. J. Clin. Oncol. 29, 4594–4595 (2011).
10–1 10–1
10–2
16. Stone, K. R., Mickey, D. D., Wunderli, H., Mickey, G. H. & Paulson, D. F. Isolation of a human
10–2
10–3 10–3
prostate carcinoma cell line (DU 145). Int. J. Cancer 21, 274–281 (1978).
WT KO
WT KO 17. Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. A molecular signature of
metastasis in primary solid tumors. Nat. Genet. 33, 49–54 (2003).
Fig. 5 | Investigation of lipid-metabolism genes in breast cancer brain 18. Zhang, X. H.-F. et al. Selection of bone metastasis seeds by mesenchymal signals in the
metastasis. a, A schematic of an in vivo CRISPR screen investigating relative primary tumor stroma. Cell 154, 1060–1073 (2013).
19. Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic
gene fitness in brain metastasis outgrowth. b, Volcano plot showing the result
pancreatic cancer. Nature 467, 1109–1113 (2010).
of a mini-pool in vivo CRISPR screen targeting 29 lipid-metabolism-related 20. Witzel, I., Oliveira-Ferrer, L., Pantel, K., Müller, V. & Wikman, H. Breast cancer brain
genes. Thirteen genes scored at FDR < 0.05, with selective hits highlighted. metastases: biology and new clinical perspectives. Breast Cancer Res. 18, 8 (2016).
c, Individual gene validation of six hits by intracranial injection of JIMT1 edited 21. Kodack, D. P., Askoxylakis, V., Ferraro, G. B., Fukumura, D. & Jain, R. K. Emerging strategies
for treating brain metastases from breast cancer. Cancer Cell 27, 163–175 (2015).
cells. Cell outgrowth in brain metastasis was monitored by real-time BLI. Two
22. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours
independent guides per gene were tested, with one guide per-mouse. d, BLI and reveals novel subgroups. Nature 486, 346–352 (2012).
quantification of relative fold change in metastasis load in the organs of mice 23. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast
receiving intracardiac injection of wild-type (WT) or SREBF1-knockout (KO) tumours. Nature 490, 61–70 (2012).
JIMT1 cells. Data are mean ± s.e.m. Each group contains five mice. e, BLI and 24. Razavi, P. et al. The Genomic Landscape of Endocrine-Resistant Advanced Breast
Cancers. Cancer Cell 34, 427–438 (2018).
quantification of relative fold change in brain metastasis load in mice receiving 25. Gatza, M. L. et al. A pathway-based classification of human breast cancer. Proc. Natl Acad.
intracarotid injection of wild-type or SREBF1-KO JIMT1 cells. Data are Sci. USA 107, 6994–6999 (2010).
mean ± s.e.m. n = 7 (wild-type) and n = 8 (knockout) mice. 26. Creighton, C. J. et al. Proteomic and transcriptomic profiling reveals a link between the
PI3K pathway and lower estrogen-receptor (ER) levels and activity in ER+ breast cancer.
Breast Cancer Res. 12, R40 (2010).
large repertoire of models for exploration of metastasis mechanisms. 27. Ricoult, S. J. H., Yecies, J. L., Ben-Sahra, I. & Manning, B. D. Oncogenic PI3K and K-Ras
stimulate de novo lipid synthesis through mTORC1 and SREBP. Oncogene 35,
A limitation of the use of human cell lines for such experiments is that 1250–1260 (2016).
they require the use of immunodeficient mice. The extent to which the 28. Cai, Y. et al. Loss of chromosome 8p governs tumor progression and drug response by
immune system has a role in mediating patterns of metastasis remains altering lipid metabolism. Cancer Cell 29, 751–766 (2016).
29. Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. 25, 850–860 (2019).
to be determined37. 30. Patra, K. C. & Hay, N. The pentose phosphate pathway and cancer. Trends Biochem. Sci.
We followed up only a small proportion of the MetMap findings— 39, 347–354 (2014).
specifically, breast cancer metastasis to brain. Multiple lines of experi- 31. Jain, M. et al. A systematic survey of lipids across mouse tissues. Am. J. Physiol.
Endocrinol. Metab. 306, E854–E868 (2014).
mental and clinical evidence pointed to a role of lipid metabolism in 32. Piomelli, D., Astarita, G. & Rapaka, R. A neuroscientist’s guide to lipidomics. Nat. Rev.
governing the ability of cells to survive in the brain microenvironment. Neurosci. 8, 743–754 (2007).
The importance of lipid metabolism in cancer has been highlighted by 33. Paget, S. The distribution of secondary growths in cancer of the breast. 1889. Cancer
Metastasis Rev. 8, 98–101 (1989).
a number of studies, but its role in brain metastasis has, to our knowl- 34. Dempster, J. M. et al. Agreement between two large pan-cancer CRISPR–Cas9 gene
edge, not been fully appreciated38–41. The possibility that interfering dependency data sets. Nat. Commun. 10, 5817 (2019).
with lipid or cholesterol metabolism might abrogate metastatic growth 35. Horton, J. D., Goldstein, J. L. & Brown, M. S. SREBPs: activators of the complete program
of cholesterol and fatty acid synthesis in the liver. J. Clin. Invest. 109, 1125–1131 (2002).
in the brain is particularly intriguing. More generally, this work illus- 36. Varešlija, D. et al. Transcriptome characterization of matched primary breast and brain
trates the complex interplay between cancer cell growth and the tissue metastatic tumors to detect novel actionable targets. J. Natl. Cancer Inst. 111, 388–398
microenvironment. (2019).
37. Angelova, M. et al. Evolution of metastases in space and time under immune selection.
Cell 175, 751–765 (2018).
38. Zhang, M. et al. Adipocyte-derived lipids mediate melanoma progression via FATP
Online content proteins. Cancer Discov. 8, 1006–1025 (2018).
39. Zou, Y. et al. Polyunsaturated fatty acids from astrocytes activate PPARγ signaling in
Any methods, additional references, Nature Research reporting sum- cancer cells to promote brain metastasis. Cancer Discov. 9, 1720–1735 (2019).
maries, source data, extended data, supplementary information, 40. Pascual, G. et al. Targeting metastasis-initiating cells through the fatty acid receptor
acknowledgements, peer review information; details of author con- CD36. Nature 541, 41–45 (2017).
41. Sullivan, M. R. et al. Quantification of microenvironmental metabolites in murine cancers
tributions and competing interests; and statements of data and code reveals determinants of tumor nutrient availability. eLife 8, e44235 (2019).
availability are available at https://doi.org/10.1038/s41586-020-2969-2.
1. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of
anticancer drug sensitivity. Nature 483, 603–607 (2012). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods The optimized dissociation solutions and programs (Miltenyi Bio-
tec) are listed in Supplementary Table 9. Bones (from both hind limbs)
No statistical methods were used to predetermine sample size. The were chopped into fine pieces and incubated in the dissociation buffer
experiments were not randomized. The investigators were not blinded with vigorous shaking. The dissociated cell suspensions were filtered
to allocation during experiments and outcome assessment. using 100-μm filters, and washed with DMEM/F12 twice. Cell suspen-
sions were then washed with staining buffer (PBS, 2 mM EDTA, 0.5%
Breast cancer cell lines and barcoding BSA), and incubated with mouse cell depletion beads according to
Breast cell lines were cultured under the recommended conditions from the instructions (Miltenyi Biotec). Cell suspensions were subjected
CCLE (https://portals.broadinstitute.org/ccle). Cell line identities were to negative selection using autoMACS Pro Separator (Miltenyi Bio-
confirmed by SNP fingerprinting as well as RNA-seq, and compared to tec) to deplete mouse stroma. Brains were subjected to an additional
the CCLE results. All cell lines were tested negative for mycoplasma. The myelin-debris-depletion step using myelin removal beads II (Miltenyi
fluorescence-luciferase-barcode (FLB) construct was engineered using Biotec). The resultant cell suspensions were then analysed by FACS
the FUW lentiviral vector backbone (a gift from D. Baltimore; Addgene using a Sony SH4800 sorter, with the fixed gate for GFP or mCherry.
plasmid no. 14882). Barcodes 26 nucleotides in length were designed DAPI staining was used to exclude dead cells. For bulk RNA-seq, cells
using barcode_generator.py (v.2.8; http://comailab.genomecenter. were sorted to a single tube in PBS, 0.4% BSA and RNasin Plus RNase
ucdavis.edu/index.php/), and cloned into the landing pad C-terminal to Inhibitor (Promega), centrifuged at 1,500 rpm for 10 min, and cell pel-
the TGA stop codon of fluorescence luciferase using Gibson assembly lets were frozen at −80 °C for downstream use. For single-cell RNA-seq,
(New England Biolabs). Lentivirus preparation and cell infection were single cells were sorted into 96-well plates containing cold TCL buffer
performed according to published protocols available at http://www. (Qiagen) containing 1% β-mercaptoethanol, snap frozen on dry ice,
broadinstitute.org/rnai. Infected cells were analysed by FACS with a and then stored at −80 °C. Ninety single cells were sorted per plate,
fixed gate for GFP or mCherry, using a Sony SH4800 sorter. the rest wells on the plate were used for negative and positive controls.
Animal studies RNA extraction, library preparation and sequencing

Animal work was performed in accordance with a protocol approved Individual cell lines, cell line pools before injection, and cells isolated
by the Broad Institute Institutional Animal Care and Use Committee from metastases were analysed by RNA-seq. RNA extraction was per-
(IACUC). NSG female mice (The Jackson Laboratory) at 5–6 weeks of age formed using Quick-RNA MicroPrep according to the manufacturer’s
were used. Cancer cells were suspended in PBS, 0.4% BSA and 100 μl of instructions (Zymo Research). RNA was quantified using an RNA 6000
cell suspensions were injected into the left ventricle of anaesthetized Pico Kit on a 2100 Bioanalyzer (Agilent). RNA samples from cell num-
mice (ketamine 100 mg kg−1; xylazine 10 mg kg−1). In vivo metastasis bers lower than 500 were not measured but all were used as input for
progression was monitored via real-time BLI using the IVIS SpectrumCT library preparation. cDNA was synthesized using Clontech SmartSeq
Imaging System (PerkinElmer) on a weekly basis. Mice were anaesthe- v.4 reagents from up to 2 ng RNA input according to the manufacturer’s
tized with inhaling isoflurane, injected intraperitoneally with D-Luciferin instructions (Clontech). Full-length cDNA was fragmented to a mean
(150 mg kg−1), and imaged with the auto exposure setting in prone and size of 150 bp with a Covaris M220 ultrasonicator and Illumina libraries
supine positions. At the end point, ex vivo BLI was performed by sub- were prepared from 2 ng of sheared cDNA using Rubicon Genomics
merging the excised organs in DMEM/F12 medium (Thermo Fisher Thruplex DNaseq reagents according to the manufacturer’s protocol.
Scientific) containing D-Luciferin for 10 min and imaged with the auto The finished double-stranded DNA (dsDNA) libraries were quantified
exposure setting. BLI analysis was performed using Living Image soft- by Qubit fluorometer, Agilent TapeStation 2200, and RT–qPCR using
ware (v.4.5, PerkinElmer). In the case of breast cancer cohort study (pilot, the Kapa Biosystems library-quantification kit. Uniquely indexed librar-
group 1 and group 2 in Fig. 1, Extended Data Fig. 1), cell lines were mixed ies were pooled in equimolar ratios and sequenced on Illumina Next-
at an equal ratio immediately before animal injection, and cell line pools Seq500 runs with paired-end 75-bp reads at the Dana-Farber Cancer
containing 2 × 104 cells per barcoded line were injected. In the case of Institute Molecular Biology Core Facilities. RT–qPCR quantification of
single breast cell line validation (Extended Data Fig. 1n), cell lines were barcodes was performed using Maxima First Strand cDNA Synthesis
injected individually at a density of 2 × 104 cells, to be comparable with Kit, Taqman Fast Advanced Master Mix, custom synthesized Taqman
the pooled experiments. In the case of MetMap125 (Fig. 2, Extended probes, and QuantStudio 6 PCR System (ThermoFisher Scientific).
Data Fig. 2), PRISM pools of 25 cell lines were used, and 2.5 × 105 total Single-cell RNA-seq was performed as previously described44.
cells were injected per mouse, corresponding to 1 × 104 cells per bar-
coded line. Five PRISM pools were injected separately into cohorts of Bioinformatic analysis
5–6-week-old NSG mice. In the case of MetMap500, 20 PRISM pools of Barcode quantification from RNA-Seq of metastases. Because the
25 cell lines were combined to form a large pool of 498 cell lines. The large RNA-seq library preparation sheared the cDNA randomly into small
pool was injected into a cohort of 8–10-week-old NSG mice, with 2.5 × 105 pieces, demultiplexed RNA-seq reads were mapped to the barcode
cells per mouse, equivalent to a density of 500 cells per line. Mammary references using Bowtie 2 local mode45 for barcode detection and
fat pad and subcutaneous injections were performed with Matrigel quantification. Mapped reads were filtered with the criteria that reads
(Corning) support, at a matching density to their intracardiac assays, (either 5′ or 3′) must cover over 50% of the barcodes from either end,
respectively (Extended Data Fig. 3). For all pooled cell line experiments, and counted using samtools. Barcode percentage corresponding to
mice were euthanized 5 weeks after injection, in a time-matched manner, cell composition was calculated for single cell lines, pre-injected cell
unless they displayed severe paralysis or poor body condition, in which mixtures and in vivo metastasis samples.
case they were euthanized earlier. Intracartoid injection of JIMT1 was
performed following a published protocol42, at a density of 1 × 105 cells Metastatic potential quantification and feature associations. For
per mouse, similar to the intracardiac injection (Fig. 5e). Intracranial breast cohort study, metastatic potential of cell line j targeting organ
1 n
injection was performed as previously described43, at a density of 1 × 103 i, Mi,j was calculated as: Mi , j = n ∑ k =1 cipj , in which ci is the total cancer
cells per perturbation per animal (Fig. 5a–c, Extended Data Fig. 10c, d). cell number isolated from organ i, pj is the fractional proportion of cell
line j estimated by barcode quantification, and n is the number of rep-
Tissue processing and cancer cell isolation from organs licates of mice. To identify features that associate with brain meta-
Organs including brain, lung, liver, kidney were dissociated using static potential, a two-class comparison method was used46. The
gentleMACS Octo Dissociator with Heaters (Miltenyi Biotec). analysis was performed on mutation, copy number, expression,
Article
metabolite, and CRISPR-gene dependency (available at https://depmap. potential. Relative metastatic potential of cell line j targeting organ i,
1 n 1 m
org/portal/). Copy number data were binarized using a cutoff of ≤−1 rMi,j, was defined as: rMi , j = n ∑ k =1 ci , j / m ∑ k =1 pj , in which ci,j is the read
(loss) and ≥1 (gain). counts of cell line j from organ i, pj is the read counts of cell line j from
pre-injected population, n is the number of replicate samples of mice,
Cancer transcriptomic analysis from RNA-seq of metastases. Po- m is the number of replicates of pre-injected population. Confidence
tential mouse contaminating reads were removed by competitive map- intervals were calculated using bootstrap resampling.
ping to the human/mouse hybrid genome using BBSplit (https://source-
forge.net/projects/bbmap/). Reads that uniquely mapped to the human In vivo CRISPR screen and gene validation
genome were then used as input for mapping and gene-level counting CRISPR–Cas9 versions of cell lines were generated by infecting lucifer-
with the RSEM package47. Gene count estimates were normalized using ized cells with Cas9-Blast lentivirus and selecting in 5 μg ml−1 blasticidin
the TMM method in edgeR48. For differential analysis, to properly ac- for 10 days with continuous passaging until non-infected controls were
count for the cancer cell composition differences in each in vivo sam- killed. For pooled in vivo screen, JIMT1–Cas9 cells were infected with a
ple, an in silico modelled in vitro mixture was generated first. For each CRISPR guide library (Supplementary Table 10) in an arrayed-fashion in
in silico metastasis model, the estimated expression ĝ of gene i is com- 6-well plates, and selected in 2 μg ml−1 puromycin for 4 days. At this time,
puted as a weighted average of the cell lines present in the correspond- non-infected controls were killed, and no growth defect was observed in
M
ing in vivo sample: gˆi = ∑ j =1 gi , j pj , in which gi,j is the baseline in vitro the perturbed cell lines. Post antibiotic selection, cells were pooled and
expression of gene i in cell line j and pj is the fractional proportion of subjected to intracranial injection at 6 × 104 cells per mouse in 1 μl PBS.
cell line j in the in vivo sample, as estimated by barcode quantification, This was equivalent to 1 × 103 cells per guide on average per mouse. Intrac-
and M is the number of cell lines present in the in vivo sample. The in vivo ranial growth was allowed to progress for 4 weeks, and brain tissues
and in silico counterpart were then compared using a paired design for were processed adopting the workflow of PRISM in vivo assay, except
each organ in voom-limma46. GSEA was performed using camera or that guides were amplified using primers targeting the guide vector.
GSEA-preranked method implemented in fgsea46,49. Single sample GSEA Demultiplexed sequencing reads were mapped to the guide reference
signature projection was performed using gsva package 50. to generate a table of barcode counts for each guide for each sample.
Gene-signature datasets were from MSigDB (https://www.gsea-msigdb. Sequencing-depth was normalized using the upper-quartile method
org/). and relative depletion was quantified using a linear model in limma46.
For individual gene validation (Fig. 5c, Extended Data Fig. 10c, d),
PRISM in vivo assay Cas9-expressing cells of different cell lines were infected with cor-
PRISM pool preparation. PRISM cell lines (source of each available responding guides, selected in 2 μg ml−1 puromycin for 4 days, and
at https://depmap.org/portal/) were adapted to the same culture subjected to intracranial injection at 1 × 103 cells per mouse in 1 μl PBS.
conditions in phenol-red-free RPMI1640 medium (ThermoFisher Two independent guides per gene were tested, with one mouse per
Scientific) and barcoded as previously described12. SNP fingerprint- guide. Intracranial growth was monitored by BLI following injection.
ing authentication was performed before and after barcoding. My-
coplasma contamination was examined (MycoScope, Genlantis) and Liquid chromatography–mass spectrometry lipidomics
only negative lines were used for experiments. These included eight Positive ion mode analyses of polar and nonpolar lipids (C8-pos) were
oestrogen-receptor-positive breast cancer cell lines. Despite the lack conducted using a liquid chromatography–mass spectrometry (LC–MS)
of phenol red (a weak oestrogen) these breast cell lines maintained system composed of a Shimadzu Nexera X2 U-HPLC (Shimadzu) cou-
ESR1 positivity and expression of a downstream marker of its activity, pled to an Exactive Plus orbitrap mass spectrometer (ThermoFisher
FOXA1. This is probably explained by the remaining oestrogens in the Scientific). Cellular extracts were collected from 6-well plate culture,
fetal bovine serum (FBS). PRISM cell lines were pooled on the basis of in LC-MS-grade isopropanol (Sigma-Aldrich) containing an internal
their in vitro doubling bins, at equal number, in the format of 25 lines standard 1,2-didodecanoyl-sn-glycero-3-phosphocholine (Avanti Polar
per pool, and cryopreserved until use. Cells were thawed and recovered Lipids). Extracts were centrifuged for 10 min at 10,000g to remove resid-
for 48 h before in vivo injection. To form the large pool of 498 cell lines, ual cellular debris. After centrifugation, supernatants were injected
20 PRISM pools were mixed at equal total number immediately before directly onto a 100 × 2.1 mm, 1.7-μm ACQUITY BEH C8 column (Waters).
injection. The column was eluted isocratically with 80% mobile phase A (95:5:0.1
v/v/v 10 mM ammonium acetate/methanol/formic acid) for 1 min fol-
Tissue processing, library preparation and sequencing. After in vivo lowed by a linear gradient to 80% mobile phase B (99.9:0.1 v/v methanol/
experiments, organs were subjected to tissue dissociation, mouse formic acid) over 2 min, a linear gradient to 100% mobile phase B over
stroma depletion, and the dissociated cell pellets were frozen at −80 °C 7 min, then 3 min at 100% mobile phase B. Mass spectrometry analyses
as described above. The pellets (≤50 mg dry weight) were lysed in 200 μl were performed using electrospray ionization in the positive ion mode
freshly prepared lysis buffer with proteinase K, heat digested at 60 °C, using full scan analysis over 200 to 1,000 m/z at 70,000 resolution and 3
and denatured at 95 °C for 10 min. Twenty microlitres of the lysates was Hz data acquisition rate. Other mass spectrometry settings were as fol-
used for barcode amplification per 100 μl PCR volume (multiple tech- lows: sheath gas 50, in source collision-induced dissociation 5 eV, sweep
nical replicates per sample). PCR was performed using the following gas 5, spray voltage 3 kV, capillary temperature 300 °C, S-lens RF 60,
conditions: 95 °C for 3 min; 98 °C for 20 s, 57 °C for 15 s, 72 °C for 10 s (30 heater temperature 300 °C, microscans 1, automatic gain control target
cycles); 72 °C for 5 min; 4 °C stop. PCR libraries were pooled, purified 106, and maximum ion time 100 ms. Lipid identities were determined on
using Select-a-Size DNA Clean & Concentrator Kit (Zymo Research), and the basis of comparison to reference standards and reference plasma
quantified using Qubit dsDNA HS Assay Kit (ThermoFisher Scientific) extracts and were denoted by the total number of carbons in the lipid
and a 2100 Bioanalyzer (Agilent). The purified 2 nM of libraries with acyl chain(s) and total number of double bonds in the lipid acyl chain(s).
20% spike-in PhiX DNA were sequenced on Illumina MiSeq or HiSeq at
800 K mm−2 cluster density. Western blot
Protein lysates were prepared in RIPA lysis buffer (ThermoFisher Sci-
Metastatic potential quantification. Demultiplexed sequencing reads entific) with cOmplete Mini EDTA-free Protease Inhibitor Cocktail
were mapped to the barcode reference to generate a table of cell line (Roche). Western blot was performed using NuPAGE gel (ThermoFisher
barcode counts for each sample/condition. Sequencing-depth normal- Scientific) with wet tank blotting (Bio-Rad) and Odyssey detection
ized read counts were used for calculation of relative metastatic system (LI-COR). SREBF1 primary antibody (14088-1-AP, Proteintech),
SCD (CD.E10) antibody (ab19862, Abcam), GAPDH (D16H11) XP rabbit 42. Zhang, C., Lowery, F. J. & Yu, D. Intracarotid cancer cell injection to produce mouse
models of brain metastasis. J. Vis. Exp. 120, e55085 (2017).
monoclonal antibody (5174S, Cell Signaling), β-actin (8H10D10) mouse 43. Ozawa, T. & James, C. D. Establishing intracranial brain tumor xenografts with subsequent
monoclonal antibody (3700S, Cell Signaling), and IRDye 800CW goat analysis of tumor growth and response to therapy using bioluminescence imaging. J. Vis.
anti-mouse IgG (926-32210, LI-COR), IRDye 680RD goat anti-rabbit IgG Exp. 41, e1986 (2010).
44. Tirosh, I. et al. Dissecting the multicellular ecosystem of metastatic melanoma by
(926-68071, LI-COR) secondary antibodies were used. Western blot was single-cell RNA-seq. Science 352, 189–196 (2016).
performed for cells cultured in different medium conditions. These 45. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods
include RPMI1640 with 10% FBS, with 10% delipidated FBS, with 10% 9, 357–359 (2012).
46. Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing
human cerebrospinal fluid (991-19-P-5, Lee BioSolutions), or with 1% and microarray studies. Nucleic Acids Res. 43, e47 (2015).
SM1 supplement (05711, STEMCELL Tech), or brain-slice-conditioned 47. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or
medium. Brain-slice-conditioned medium was prepared by submerg- without a reference genome. BMC Bioinformatics 12, 323 (2011).
48. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for
ing brain slices (150 μm) in RPMI1640 (no serum) for 48 h. Delipidated differential expression analysis of digital gene expression data. Bioinformatics 26,
FBS was prepared as described51. 139–140 (2010).
49. Korotkevich, G., Sukhov, V. & Sergushichev, A. Fast gene set enrichment analysis. Preprint
at https://doi.org/10.1101/060012 (2019).
Clinical data analysis 50. Hänzelmann, S., Castelo, R. & Guinney, J. GSVA: gene set variation analysis for microarray
METABRIC, TCGA and MSK targeted-sequencing breast cancer datasets and RNA-seq data. Bmc Bioinformatic 14, 7 (2013).
were downloaded from cBioPortal52. The EMC-MSK dataset including 51. Hosios, A., Li, Z., Lien, E. & Heiden, M. Preparation of lipid-stripped serum for the study of
lipid metabolism in cell culture. Bio Protoc. 8, e2876 (2018).
615 primary tumours (GSE2035, GSE2603, GSE5327 and GSE12276) 52. Cerami, E. et al. The cBio cancer genomics portal: an open platform for
and the 65-metastasis-sample dataset (GSE14020) were collected and exploring multidimensional cancer genomics data. Cancer Discov. 2, 401–404
processed as previously described18. Paired primary breast tumour (2012).
and brain metastasis RNA-seq was obtained from ref. 36. To exclude the
Acknowledgements We thank J. L. Goldstein for suggestions; A. Regev, N. Marjanovic and
confounding effect of brain stroma contamination in this dataset, a con- A. Bankapur for assistance with single-cell RNA-seq and analysis; Z. Herbert for assistance with
tamination indicator generated from GSE52604 was applied, and the RNA-seq; T. Mason for assistance with next-generation sequencing; S. Kim and S. Roberge for
contaminating effect was regressed out, generating a corrected gene assistance with animal work; B. Wong for suggestions on figure and portal designs; and G. Wei,
U. Ben-David, S. Corsello, P. Tsvetkov, I. Tirosh, R. Hosking and C. Mader for discussions. X.J.
matrix. PI3K-response signatures were from refs. 25,26. Signature analysis and G.B.F. were Susan G. Komen Fellows. A.A. is a HHMI Medical Research Fellow. This work
was conducted as described7. Hierarchical clustering and heatmaps was supported by BroadNext10, Broad SharkTank grants (X.J.), HHMI (T.R.G.), and in part by
were generated using gplots package. Other plots were generated using Koch Institute/DFHCC Bridge project grant (M.G.V.H. and R.K.J.). R.K.J. acknowledges support
from the NIH (R35CA197742, R01CA208205 and U01CA224173), National Foundation for
ggplot2. log-rank tests of survival curve difference were calculated Cancer Research; the Ludwig Center at Harvard; the Jane’s Trust Foundation; the Advanced
using survival package. A multivariate Cox proportional hazards model Medical Research Foundation and by the U.S Department of Defense Breast Cancer Research
Program Innovator Award W81XWH-10-1-0016.
was built using the coxph function (Extended Data Fig. 6h). Significance
of overlap was calculated using chisq.test or fisher.test function. Author contributions X.J. conceptualized the project, conducted experiments, collected
data and analysed results. Z.D. assisted with experiments. K.N. and T.N. assisted with
Reporting summary bioinformatic and RNA-seq analysis. A.A., A.D., C.B.C. and M.G.V.H. performed lipidomics
and data interpretation. G.B.F. and R.K.J. performed intracranial injection experiments and
Further information on research design is available in the Nature data analysis. L.P. and A.A.T. assisted with petal plot and portal development. C.Z., L.W., D.R.
Research Reporting Summary linked to this paper. and J.R. assisted with PRISM assay and data generation. V.M. and K.C. performed tissue
imaging, data acquisition and analysis. T.R.G. supervised the research. X.J. and T.R.G. wrote
the manuscript.
Data availability Competing interests T.R.G. receives research funding unrelated to this project from Bayer
MetMap data and interactive visualization can be accessed at https:// HealthCare, Novo Ventures and Calico Life Sciences; holds equity in FORMA Therapeutics;
is a consultant to GlaxoSmithKline; and is a founder of Sherlock Biosciences. M.G.V.H. is a
pubs.broadinstitute.org/metmap. RNA-seq data generated from scientific advisory board member for Agios Pharmaceuticals, Aeglea Biotherapeutics,
this study have been deposited in the Gene Expression Omnibus Auron Therapeutics and iTeos Therapeutics. R.K.J. received a honorarium from Amgen;
(GEO) under accession numbers GSE148283 and GSE148372. Addi- consultant fees from Chugai, Merck, Ophthotech, Pfizer, SPARC and SynDevRx; owns equity
in Accurius, Enlight, Ophthotech and SynDevRx; and serves on the Boards of Trustees of
tional datasets used in this study include METABRIC, TCGA and Tekla Healthcare Investors, Tekla Life Sciences Investors, Tekla Healthcare Opportunities
MSK-targeted-sequencing breast cancer datasets from cBioPortal, Fund and Tekla World Healthcare Fund. No reagents or funding from these organizations
were used in this study. X.J. and T.R.G. are named as inventors on pending PCT Patent
the EMC-MSK dataset (GSE2035, GSE2603, GSE5327 and GSE12276),
Application No. PCT/US20/29584 filed by The Broad Institute, which describes
the 65-metastasis-sample dataset (GSE14020), paired primary tumour compositions and methods for characterizing the metastatic potential of cancer cell lines.
and brain metastasis RNA-seq from ref. 36, and GSE52604. Source data The other authors declare no competing interests.
are provided with this paper.
2969-2.
Code availability Correspondence and requests for materials should be addressed to X.J. or T.R.G.
Peer review information Nature thanks Roger Gomis, Jason Locasale, Ultan McDermott and
Custom codes used for this study are accessible at the MetMap portal the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
(https://pubs.broadinstitute.org/metmap). Reprints and permissions information is available at http://www.nature.com/reprints.
Article

Extended Data Fig. 1 | An in vivo barcoding approach to establish cultured barcoded cell lines from the pilot pool. The signal is very specific to
multiplexed cancer metastasis xenografts and validation using each barcode and there is no cross detection. j, Quantification of barcode
orthogonal assays. a, Principal component analysis (PCA) of transcriptomic abundance and cancer cell composition using the Taqman RT–qPCR assay in
expression of the breast cancer collection from CCLE, and the pooling schemes the pre-injected and metastasis samples from the pilot pool. The results agree
focusing on basal-like breast cancer. G, GFP; R, mCherry. The linked numbers with barcode quantification from bulk RNA-Seq (Extended Data Fig. 1d).
indicate the labelling barcodes. b, Real-time BLI monitoring of the overall k, Single cell RNA-Seq of metastases from different organs from the pilot pool.
metastasis progression from pilot, group1, group2 cell line pools. Data are Single cancer cells isolated from each organ were sorted into 96-well plates,
mean ± s.e.m. n = 5 (pilot), n = 8 (group1), n = 7 (group2) mice. c, Total cancer cell with 90 cells per plate (rest 6 wells for positive and negative controls), and
numbers isolated by FACS from each target organ from pilot, group1, group2 subjected to Smart-Seq2. PCA revealed that PC1 maximally separated the
pools. Each dot represents an animal. Box plots display quartiles of the data. cancer cells into 2 clusters (CLs), with CL1 enriched in cells isolated from brain,
d, Cancer cell composition of metastases from different organs as determined and CL2 enriched in cells isolated from lung, liver and bone. Heat map on the
by barcode abundance from pilot, group1, group2 pools. pilot: G portion right shows gene expression that associates with PC1 and clustering of cells.
samples are highlighted in green, R portion samples are highlighted in red. Based on marker expression, CL1 corresponds to HCC1954 (ERBB2+, CDH1+)
preinj, pre-injected population. Data c, d were used to quantify the metastatic and CL2 corresponds to MDAMB231 (CDKN2A loss, VIM+). l, Projection of
potential presented in Fig. 1b. e, An example of the gating strategy to isolate marker gene expression on the PCA plot. m, Cancer cell composition based on
GFP+ barcoded cancer cells for the pilot pool. Infected cell lines expressed GFP single cell RNA-Seq data. The results agree with barcode quantification from
at different levels as shown in the histogram, and a fixed gate was used to enrich bulk RNA-Seq (Extended Data Fig. 1d). n, Real-time BLI monitoring of
cells with closer expression level. Numbers correspond to cell percentages metastasis progression of the 8 cell lines that were individually tested. Each
within the gate. f, An example of barcode mapping result visualized by plot highlights one of the 8 lines. Data are mean ± s.e.m. Each group contains
Integrative Genomics Viewer (IGV). g, Distribution of the barcode read counts 4 mice. o, Scatter plot showing the correlation of overall metastatic potential
versus all gene transcript counts. Barcodes are among the top 10% highly (5 organs combined) from pooled cell line experiments with whole body BLI of
expressed genes, allowing robust quantification. h, An example of barcode metastases measured individually. Pearson’s correlation coefficient and its
read quantification in the pre-injected and metastasis samples from pilot pool. test P value are presented.
Barcodes are listed as in a. cpk, counts per thousand. i, Taqman assay on in vitro
Article

Extended Data Fig. 2 | Using PRISM cell line pools for metastatic potential exception of one cell line U2OS. d, Quality control of MetMap500 and
profiling. a, Optimizing the workflow of metastatic potential mapping using MetMap125 datasets showing initial barcode abundance in the pre-injected
PRISM. A PRISM pool of 25 cell lines was used for testing the need of GFP populations. MetMap500, 1 large pool containing 498 cell lines was profiled,
labelling and cancer cell purification. The barcode abundance altered with 10 cell lines showing low initial abundance. These 10 cell lines were not
compared to the unlabelled population after GFP labelling as shown by the pie detected in any in vivo sample, and were excluded from subsequent analysis.
chart. b, A detailed line-by-line view of barcode abundance before and after GFP MetMap125, 5 pools of 25 cell lines were profiled separately and data were
labelling. The unlabelled cell pool had more even distribution. Post labelling, combined for analysis. e, Quality control of MetMap500 and MetMap125
several lines showed noticeable dropout, but all lines were detectable. datasets showing scatter plots of raw barcode abundance from in vivo organs
c, Scatter plot comparing barcode enrichment after normalizing to the pre- versus the data normalized to the pre-injected input (in d). A strong linear
injected input from the two experiments. Pearson’s correlation coefficient and relationship was observed.
its test P value are presented. Strong positive correlation is observed, with the
Article
Extended Data Fig. 3 | Subcutaneous injection of PRISM cell line pool. injections. Detected lines are coloured in pink and non-detected lines are
a, The same PRISM pool of 498 cell lines used for MetMap500 profiling was coloured in light-blue. P value calculated using two-sided t-test. c, Scatter plot
tested using subcutaneous injection on a cohort of 6 mice. Survival curves showing barcode-quantified tumorigenic potential and metastatic potential
compare animal survival difference between subcutaneous and intracardiac from subcutaneous and IC experiments respectively. d, Group1 of basal breast
(IC) injections, P value calculated using two-sided, log-rank test. b, Total cancer pool (Extended Data Fig. 1a) was subjected to mammary fat pad
numbers of cell lines detected in animals from the subcutaneous and IC injection, barcode quantitation through RNA-Seq, and cell number inference.
Extended Data Fig. 4 | Association of overall metastatic potential with plots showing correlation of metastatic potential with patient age, stratified by
clinical parameters. a. Bar plots showing significance of single variate and cancer lineage. An inverse correlation was observed in several cancer types.
multi variate association analysis with metastatic potential in MetMap500. d–g, Correlation of overall metastatic potential with derived site (d), time
P values are calculated using linear regression and Anova (type II) of the linear length in culture to derive the cell lines (e), mutation burden (f) and cell
models. The dotted lines indicate 0.05 cutoff. b. Box plots showing metastatic doubling (g) in the 21 basal breast cancer cohort. d, P value calculated using
potential of cell lines stratified by metastasis status in the corresponding two-sided t-test. e–g, Pearson’s correlation coefficients and test P values are
patients and cancer lineage. Box plots display quartiles of the data. Outlier presented.
points extend beyond 1.5 × interquartile ranges from either hinge. c, Scatter
Article
Extended Data Fig. 5 | Genetic correlates of brain metastatic potential in adipogenesis peak at 8hr, 5. GO: carnitine metabolic process, 6. Reactome:
basal-like breast cancer. a. A line-by-line view of brain metastatic potential mitochondrial fatty acid beta oxidation, 7. GO: short chain fatty acid metabolic
and its associated features at genetic, expression, metabolite, and gene process. Data not available for the cell lines are marked with X. b, c, Scatter
dependency levels. Mutation: mutant (MUT), wild-type (WT). Copy number: plots showing the correlation of SREBF1 in vitro dependency and brain
data are binarized, with deletion (DEL) cutoff < = -1 and amplification (AMP) metastatic potential in MetMap500 (a) and MetMap125 (b). Strong inverse
cutoff > = 1. Expression signatures: 1. Hallmark: PI3K/AKT/MTOR signalling, correlation was observed for breast cancer in both datasets. Each dot
2. GO: ERBB signalling pathway, 3. GO: ERBB2 signalling pathway, 4. Burton: represents a cell line.
Article
Extended Data Fig. 6 | Association of chr 8p gene copy number status and PI3K-response signatures in METABRIC and TCGA breast cancer datasets.
PI3K-response signatures with brain metastasis in clinical breast cancer PI3Ksig.1 was generated by overexpression of PIK3CAmut in breast epithelial
specimens. a, Heat maps showing coordinated expression of chr 8p genes cells. PI3Ksig.2 was generated by PI3K inhibitor treatment in the CMap
mirrored their copy number status in the two large breast cancer datasets, database. The right panel shows distribution of PI3Ksighigh cluster in different
METABRIC and TCGA. The 8plow cluster is defined by CNA data. The right panel breast cancer subtypes and its association with disease specific survival.
shows distribution of 8plow cluster in different breast cancer subtypes and its P values calculated using two-sided, log-rank tests. f, Hierarchical clustering of
association with disease specific survival. P values calculated using two-sided, primary breast tumours by PI3K signatures in the EMC-MSK dataset. The
log-rank tests. CNA, Copy Number Alteration. Exp, RNA-Seq Expression. PI3Ksighigh cluster is enriched in tumours that developed brain metastasis. The
b, Hierarchical clustering of primary breast tumours by 8p gene expression in right panel shows organ-specific metastasis free survival curves stratified by
the EMC-MSK dataset. The 8plow cluster is enriched in tumours that developed PI3K signatures. The PI3Ksighigh cluster displayed poorer brain metastasis.
brain metastasis, but not lung or bone metastasis. The right panel shows organ- Brain metastasis free survival curves stratified by PI3K signatures in different
specific metastasis free survival curves stratified by 8plow status. The 8plow subtypes is also presented. P values calculated using two-sided, log-rank tests.
cluster displays poorer brain metastasis compared to the 8pWT cluster. Brain g, Hierarchical clustering of breast cancer metastases by PI3K signatures, with
metastasis free survival curves stratified by 8plow status in different subtypes the PI3Ksighigh cluster being enriched in brain metastases. h, Heat maps
is also presented. P values calculated using two-sided, log-rank tests. showing significant yet non-complete overlap between 8plow and PI3Ksighigh
c, Hierarchical clustering of breast cancer metastases by 8p gene expression, clusters in the EMC-MSK dataset. 8plow and PI3Ksighigh clusters co-capture a
with the 8plow cluster being enriched in brain metastases. d, Chr 8p CNA status subset of patients with the worst brain metastasis prognosis. P values
determined by Targeted Seq in the MSK metastatic breast cancer dataset. Brain calculated using two-sided, log-rank tests. The lower panel presents a Cox
metastases are enriched in chr 8p deletion compared to primary tumour, local proportional-hazards model of brain metastasis free survival using multi
recurrence and metastases at other sites. The 8plow cluster predicts poor brain variates – 8p, PI3Ksig, and breast cancer subtype. The 8plow/PI3Ksighigh cluster
metastasis free survival. P values calculated using two-sided, log-rank tests. LN, is the most associated with brain metastasis. i. 8plow and PI3Ksighigh clusters co-
lymph node. e, Heat maps showing co-regulated patterns of two independent capture the majority of brain metastasis samples.
Extended Data Fig. 7 | Lipid metabolite profile changes upon SREBF1 SREBF1-KO of JIMT1 (PIK3CAmut) and HCC1806 (8plow) were used. Lipid species
knockout. Heat maps showing relative lipid abundance in cells cultured in groupings and lipid desaturation levels are also presented. WT, wild-type; KO,
medium supplemented with serum or delipidated serum. SREBF1-WT and knockout.
Article
Extended Data Fig. 8 | Analysis of multiplexed breast cancer metastasis dominated the population. e, Correlation of gene expression changes in
in vivo transcriptomes. a, A schematic of the differential analysis approach different metastasis sites. Pre-injected population had no expression change
for in vivo transcriptomes with mixtured cancer cell lines. An in silico and thus showed no correlation with in vivo samples. Brain metastases showed
transcriptome was modelled based on single cell line in vitro transcriptomes weaker correlations with extracranial metastases. f, Bubble plot showing
and cell line composition (comp.) of the metastasis sample. The in silico profile enrichment of Hallmark gene pathways (MSigDB) comparing in vivo expression
was then compared with the actual in vivo data in a paired-wise manner. of metastases at different organ sites to their in vitro counterparts. g, Bubble
b, Comparison of in silico modelled profiles to the actual pre-injected or in vivo plot showing in vivo upregulation of SREBF1, SCD and SREBF1-response
metastasis sample profiles. The pre-injected populations are direct mixtures signature in brain metastases. h, i, GSEA analysis of lipid metabolism gene sets
of in vitro cell lines and show tight correlation with in silico data. In vivo using in vivo RNA-Seq profiles combined by metastasis organ sites irrespective
samples show large fold changes. c, Box plots showing log 2 fold changes of of sample or cell line composition (h). Gene sets related to lipid metabolism are
MUCL1 and SCGB2A2 in in vivo metastasis samples and pre-injected cells. Each selectively enriched on top in the brain but not in other organs or in vitro.
point represents a sample. Box plots display quartiles of the data. Outlier Restricting analysis to JIMT1-dominant samples revealed a similar result. No
points extend beyond 1.5 × interquartile ranges from either hinges. d, Heat map enrichment was seen in normal brain when analysis was performed on GTEX
showing log 2 fold change of lung metastasis genes (Minn et al.) in lung, liver, normal tissue (i). Each tick represents a lipid metabolism gene set from
kidney and bone metastasis samples from the pilot study, where MDAMB231 MSigDB. ***, P = 0.001; ** = 0.01.
Article
Extended Data Fig. 9 | Expression of TGFβ signalling, EMT status, Arrowheads indicate a few brain metastasis samples with noticeable brain
inflammatory response and lipid metabolism genes in clinical breast stroma contamination. A brain contamination score was calculated and its
cancer metastasis specimens. a, Comparison of brain metastasis versus effect was regressed out in the RNASeq data of matched primary tumours and
extracranial metastasis clinical samples. Lower expression of TGFβ signature brain metastases (c). The heat map shows expression of brain stroma indicator
genes and EMT signature genes in brain metastases than other metastasis sites. before and after removal of the contamination effect. d, e, Paired comparison
Enriched expression of selective SREBF1 target genes (including FASN, SCD, of primary breast tumour and brain metastasis clinical specimens after
SREBF1 itself) and Pentose Phosphate Pathway (PPP) genes in brain metastases. removal of brain stroma contamination. d, Lipid metabolism genes and PPP
b, c, A strategy to remove brain stroma contamination effect from brain genes. e, Signature scores were projected for each sample using the corrected
metastasis expression profiles when performing comparison of paired RNA-Seq profiles. P, Primary breast tumour; M, brain Metastasis; upregulation
primary breast tumour and brain metastasis clinical specimens. A gene in red, downregulation in blue. P values calcutated using paired, two-sided
signature indicating brain stroma contamination was derived from t-tests.
comparison of brain with breast and breast cancer brain metastasis (b).
Article
Extended Data Fig. 10 | In vivo and in vitro effects of SREBF1 knockout. brain metastases were derived for CRISPR-seq (e), western blot (f), and RT–
a, Growth kinetics of SREBF1-WT and -KO cells in in vitro culture medium with qPCR (g) assays. e, CRISPR-seq quantifying SREBF1 gene editing efficiencies of
10% serum or 10% delipidated serum. Cell growth was monitored by Incucyte brain-derived and pre-injected cells. f, Western blot quantifying SREBF1
real-time imaging. WT, wild-type, in black; KO, knockout, in red. Two protein levels. g, RT–qPCR quantifying relative expression of SREBF1, SCD,
independent guides were used per group. b, Fluorescence imaging of CD36, FABP6 in brain-derived versus pre-injected cells. Pre-injected WT
metastases in serial brain sections from mice receiving intracardiac injection HCC1806 was used as reference. h, i, Brain-derived and pre-injected HCC1806
of JIMT1 SREBF1-WT or -KO cells (Fig. 5d). Confocal tile scans of representative cells were cultured in brain-slice-conditioned medium (CM) or medium
sections are presented at the lower panel. GFP+ signals indicate cancer lesions. supplemented with cerebrospinal fluid, or serum, or delipidated serum, or SM1
Circles highlight macro-metastatic lesions and arrows indicate micro lesions. supplement, and western blot (h) or RT–qPCR was performed (i). SREBF1, SCD
c, d, One-by-one assessment of lipid metabolism gene fitness in additional and CD36 were upregulated when cells were cultured in brain slice CM,
brain metastatic cell lines through intracranial injection. SREBF1 was tested for cerebrospinal fluid, and delipidated serum. Brain-derived SREBF1-KO cells
HCC1954, MDAMB231 (c) and HCC1806. Additional genes were tested for were better at inducing SCD and CD36, in comparison to pre-injected SREBF1-
HCC1806 (d). Cell outgrowth in brain metastasis was monitored by real-time KO cells. Experiments were performed twice independently with similar
BLI. Two independent guides per gene were tested, in a one guide one mouse results.
fashion. e–g, Outgrowing (HCC1806) or residual (JIMT1) SREBF1-KO cells from
Corresponding author(s): Todd R Golub; Xin Jin
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection Bioluminescence imaging data was acquired with Living Image software (v4.5, PerkinElmer). Lipidomics mass spectrometry data was
acquired using a LC-MS system composed of a Shimadzu Nexera X2 U-HPLC (Shimadzu Corp) coupled to a Exactive Plus orbitrap mass
spectrometer (ThermoFisher Scientific).
Data analysis Following softwares were used for data analysis: Living Image software (v4.5), Bowtie 2 (v2.2.8), samtools (v 1.3.1), BBSplit (https://
sourceforge.net/projects/bbmap/), RSEM (v1.3.1), R statistical software (v3.6.2), ggplot2 (3.3.0), limma (3.42.2), edgeR (3.28.0), gsva
(1.34.0), gplots (3.0.1.2), survival (3.1-8), fgsea (1.12.0), GSEA (v3.0), GenePattern (v2.0).
Data
October 2018
MetMap data and interactive visualization can be accessed at pubs.broadinstitute.org/metmap. RNA-seq data generated from this study have been deposited to
Gene Expression Omnibus (GEO), at accession numbers GSE148283 and GSE148372. Additional datasets used in this study include METABRIC, TCGA, and MSK-
targeted-sequencing breast cancer datasets downloadable from cBioPortal, EMC-MSK dataset (GSE2035, GSE2603, GSE5327, GSE12276), 65 metastasis sample
dataset (GSE14020), paired primary tumor and brain metastasis RNA-Seq from Vareslija et al, and GSE52604.
1

Sample size Sample sizes were determined to be adequate for minimal n required for statistical tests, or consistency of measurable differences between
groups following guidance and experience from Ref. 5, 7, 18.
Data exclusions Failed RNA-Seq samples were excluded from analysis presented in the manuscript. In MetMap500 experiment (Fig. 2), one animal died early
and organs could not be collected in time, and is excluded from analysis.
Replication Cell culture based experiments (including growth assay, RT-qPCR, western blot) were performed twice independently. Animal experiments
were validated using completely independent methods instead of direct repeat (Pooled experiment vs individual injection in Fig. 1a, Extended
Data Fig. 2g; MetMap500 vs MetMap125 in Fig. 2c; mini-pool CRISPR screen vs one-by-one testing in Fig. 5a-c).
Randomization Randomization was not applicable to experiments in this study. In MetMap profiling, we varied pooling format, cell density, cohort size,
animal age to account for these potential covariates.
Blinding Blinding to group allocations was not applicable to experiments in this study.


Antibodies ChIP-seq
Clinical data
Antibodies
Antibodies used SREBF1 primary antibody (14088-1-AP, Proteintech)
SCD (CD.E10) antibody (ab19862, Abcam)
GAPDH (D16H11) XP® Rabbit mAb (5174S, Cell Signaling)
β-Actin (8H10D10) Mouse mAb (3700S, Cell Signaling)
IRDye® 800CW Goat anti-Mouse IgG (926-32210, LI-COR)
IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (926-68071, LI-COR).
Validation SREBF1 primary antibody (14088-1-AP, Proteintech): validated by manufacturer, and by this study (Extended Data Fig. 11f,h), and
cited in publications, suitable for western blot
SCD (CD.E10) antibody (ab19862, Abcam): validated by manufacturer, and by this study (Extended Data Fig. 11f,h), suitable for
western blot
GAPDH (D16H11) XP® Rabbit mAb (5174S, Cell Signaling): validated by manufacturer and cited in publications, suitable for
October 2018
western blot
β-Actin (8H10D10) Mouse mAb (3700S, Cell Signaling): validated by manufacturer and cited in publications, suitable for western
blot
2

Cell line source(s) All cell lines listed in Supplementary Table 1 and 3 were obtained from CCLE.
Authentication Cell lines were authenticated by DNA fingerprinting analysis. The breast cell line identities were also confirmed by RNA-Seq
and compared to CCLE RNA-Seq profiles.
Mycoplasma contamination All cell lines were confirmed to be mycoplasma free using the MycoAlertTM Mycoplasma Detection Kit (Lonza).
Commonly misidentified lines PC-14 was identical to PC-9 as reported before (https://web.expasy.org/cellosaurus/CVCL_1640; https://
(See ICLAC register) www.sigmaaldrich.com/catalog/product/sigma/cb_90071810?lang=en&region=US). To keep consistent with CCLE
nomenclature, PC14_LUNG was used. KPL-1 was found to be a MCF-7 derivative (https://web.expasy.org/cellosaurus/
CVCL_2094). To keep separate from MCF-7 and consistent with CCLE nomenclature, KPL1_BREAST was used.

Laboratory animals NOD scid gamma (NSG) female mice (The Jackson Laboratory) of 5~6 or 8~10 weeks were used for metastasis xenograft studies.
Broad Vivarium’s housing conditions for NSG mice include sterilized, individually ventilated cages with cellulose bedding. Water
bottles are supplied with acidified, reverse osmosis water. The holding room is maintained under positive pressure, temperature
70°F (+/-2°F), humidity 40% (+/- 10%), lighting 12 on/12 off light cycle.
Wild animals No wild animals were used in the study.
Field-collected samples No field collected samples were used in the study.
Ethics oversight Animal work was performed in accordance with a protocol approved by the Broad Institute Institutional Animal Care and Use
Committee (IACUC).
Flow Cytometry
Plots
Confirm that:
The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.
Methodology
Sample preparation Organs were dissociated using dissociation protocols listed in Supplementary Table 9 with gentleMACS Octo Dissociator (Miltenyi
Biotec). Dissociated cell suspensions were filtered using 100 μm filters, and washed with DMEM/F12 twice. Cell suspensions
were then washed with staining buffer (PBS + 2mM EDTA + 0.5% BSA), and incubated with mouse cell depletion beads according
to the instructions (Miltenyi Biotec). Cell suspensions were subjected to negative selection using autoMACS Pro Separator
(Miltenyi Biotec) to deplete mouse stroma. Brains were subjected to an additional myelin debri depletion step using myelin
removal beads II (Miltenyi Biotec). In vitro cultured cells were trypsinized and resuspended as single cell suspensions. DAPI
staining was used to exclude dead cells.
Instrument SONY SH4800
Software SH4800S and FlowJo (v10.2)
Cell population abundance Data is presented in Extended Data Fig 1c and Source Data.
October 2018
Gating strategy Gating strategy is illustrated in Extended Data Fig. 1e to select for single cells with the fixed gate for GFP or mCherry.
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.
3
Article
A map of cis-regulatory elements and 3D

genome structures in zebrafish
https://doi.org/10.1038/s41586-020-2962-9 Hongbo Yang1,11, Yu Luan1,11, Tingting Liu1,11, Hyung Joo Lee2, Li Fang3, Yanli Wang4,
Xiaotao Wang1, Bo Zhang4, Qiushi Jin1, Khai Chung Ang5, Xiaoyun Xing2, Juan Wang1, Jie Xu1,
Fan Song4, Iyyanki Sriranga1, Chachrit Khunsriraksakul4, Tarik Salameh4, Daofeng Li2,
Accepted: 17 September 2020 Mayank N. K. Choudhary2, Jacek Topczewski6,7, Kai Wang3, Glenn S. Gerhard8,
Ross C. Hardison9, Ting Wang2, Keith C. Cheng5 & Feng Yue1,10 ✉
Check for updates

The zebrafish (Danio rerio) has been widely used in the study of human disease and
development, and about 70% of the protein-coding genes are conserved between the
two species1. However, studies in zebrafish remain constrained by the sparse
annotation of functional control elements in the zebrafish genome. Here we
performed RNA sequencing, assay for transposase-accessible chromatin using
sequencing (ATAC-seq), chromatin immunoprecipitation with sequencing,
whole-genome bisulfite sequencing, and chromosome conformation capture (Hi-C)
experiments in up to eleven adult and two embryonic tissues to generate a
comprehensive map of transcriptomes, cis-regulatory elements, heterochromatin,
methylomes and 3D genome organization in the zebrafish Tübingen reference strain.
A comparison of zebrafish, human and mouse regulatory elements enabled the
identification of both evolutionarily conserved and species-specific regulatory
sequences and networks. We observed enrichment of evolutionary breakpoints at
topologically associating domain boundaries, which were correlated with strong
histone H3 lysine 4 trimethylation (H3K4me3) and CCCTC-binding factor (CTCF)
signals. We performed single-cell ATAC-seq in zebrafish brain, which delineated 25
different clusters of cell types. By combining long-read DNA sequencing and Hi-C, we
assembled the sex-determining chromosome 4 de novo. Overall, our work provides an
additional epigenomic anchor for the functional annotation of vertebrate genomes
and the study of evolutionarily conserved elements of 3D genome organization.
The zebrafish has been an important vertebrate model system for Table 1). Because histone modifications have been used to predict
several decades because of its high fecundity, external embryogen- different classes of potential regulatory elements such as enhancers
esis, rapid embryonic development and nearly transparent embryos. and repressors10,11, we also performed chromatin immunoprecipitation
These features have made it an ideal system for the study of verte- followed by DNA sequencing (ChIP-seq) for a panel of histone modifica-
brate development and ageing2, comparative genomics3 and human tions, including H3K4me3, H3 lysine 27 acetylation (H3K27ac), H3 lysine
disease modelling. However, there is no comprehensive annotation 9 dimethylation (H3K9me2) and H3 lysine 9 trimethylation (H3K9me3).
of the cis-regulatory elements in the zebrafish genome. Although To study higher-order chromatin structure and link distal enhancers to
previous genomic studies in zebrafish have provided critical biologi- their target genes, we performed Hi-C experiments in adult brain and
cal insights4–8, most used whole embryos and our understanding of muscle (Fig. 1a, b). Although chromosome 4 is regarded as the ‘rudimen-
tissue-specific regulators remains limited. tary’ sex chromosome in zebrafish12, the quality of its current assembly
To profile the transcribed regions, chromatin accessibility is poor owing to the heavy presence of heterochromatin. Therefore,
and DNA methylation patterns in the zebrafish genome, we per- we performed three long-read DNA sequencing experiments (nano-
formed strand-specific RNA sequencing (RNA-seq), ATAC-seq9 and pore, 10X Genomics and Bionano optical mapping) in one Tübingen
whole-genome bisulfite sequencing (WGBS) in up to eleven zebrafish female zebrafish to generate a de novo assembly of chromosome 4. To
adult tissues and two embryonic tissues (Fig. 1a, b, Supplementary investigate the cell types and their regulatory elements in the zebrafish
1
Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA. 2Department of Genetics, The Edison Family Center for Genome
Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA. 3Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of
Philadelphia, Philadelphia, PA, USA. 4Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA. 5Department of Pathology and Penn State Zebrafish
Functional Genomics Core, College of Medicine, The Pennsylvania State University, Hershey, PA, USA. 6Department of Pediatrics, Northwestern University Feinberg School of Medicine,
Chicago, IL, USA. 7Stanley Manne Children’s Research Institute, Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, USA. 8Department of Medical Genetics and Molecular
Biochemistry, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA. 9Department of Biochemistry and Molecular Biology, Pennsylvania State University, University
Park, PA, USA. 10Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, USA. 11These authors contributed equally: Hongbo Yang, Yu Luan, Tingting Liu.
✉e-mail: Yue@northwestern.edu

Article
a Epiderm Mesoderm Endoderm Embryonic b
chr4:22,000,000-28,520,000
Brain
RNA-seq
ATAC-seq
Hi-C
H3K27ac
H3K4me3
H3K9me3
H3K9me2 chr4:22.80 Mb-23.32 Mb
cd36 magi2
WGBS
Hi-C
H3K9me2 H3K9me3 ATAC-seq H3K4me3 H3K27ac

Brain
n
e
nk
ey
e
in
on
is
r
n
en
rt
Muscle
ai
cl
ve
in
oo
ai
ea
st
Sk
dn
tru
br
us
le
ol
st
Br
20
Li
Te
Bl
H
E-
Sp
C
Ki
E-
M
te
c d Liver
In
3 0
Zebrafish 2 Brain-specific genes
log(TPM+1) myh6
1 20
0
0 0
5 3 20
Human Human orthologues
log(TPM+1)
MYH6 2 0
1
0 0 5
0
Te ain
e
Sk d
Ki lon
C e
C ne
Te ain
e
Sp ney
Sk d
H in
us t
Bl en
In Liv s
Ki lon
st r
Sp ney
H in
us t
Bl en
In Liv s
te er
M ear
te e
cl
M ar
in
cl
oo
i
oo
i
st
st
i
le
Br
le
st
o
Br
e
o
d
d
e f 5
A predicted novel transcript RNA-seq H3K4me3
Novel transcripts 0
Intensity
WGBS
E -brain H3K4me3
n = 8,311
H3K27ac 1
RNA seq 0
10
RNA-seq
E-trunk
0
20 10
0 0
chr1:660,000-671,000 –2 kb TSS +3 kb –2 kb TSS +3 kb
Fig. 1 | Identification of cis-regulatory elements in the zebrafish genome. mean ± s.e.m. TPM, transcripts per million base pairs. d, Box plot of the
a, Tissues and analyses performed in this study. H3K27ac, H3K4me3, H3K9me3 expression of brain-specific genes in zebrafish (top) (n = 2,481) and the
and H3K9me2 represent ChIP-seq analyses with the indicated antibody. expression of their orthologues in human (bottom) (n = 2,481). The y-axis
b, Snapshot of an example region, showing Hi-C, ChIP-seq, ATAC-seq, WGBS shows the gene expression value: log10(TPM+1). e, An example of a predicted
and RNA-seq data in adult zebrafish brain, muscle and liver, using WashU novel transcript. Vertical scale: 0–20 (H3K27ac and H3K4me3), 0–10 (RNA-seq).
Epigenome Browser. Plots show relative amount, normalized to the range of f, RNA-seq and H3K4me3 ChIP-seq signals for the predicted 8,311 novel
values. The values on the y-axis for ChIP-seq analyses were normalized to the transcripts across all the tissues. In all box plots, horizontal line shows the
input. Data range for the Hi-C heat map is 0–40 raw read counts. c, Expression median, the box encompasses the interquartile range, and whiskers extend to
of myh6 in zebrafish and the paralogue MYH6 in human are heart-specific 5th and 95th percentiles.
(n = 2). The values for human expression were from GTEx. Data are
brain, we performed single-cell ATAC-seq (scATAC-seq). In total, we in each tissue and merged them into a list of 436,036 non-redundant
generated 161 genomic datasets that comprised over 10 billion reads. peaks across all tissues (Fig. 2a, Supplementary Table 5). Of these
To our knowledge, this is the most comprehensive analysis of candidate peaks, 116,353 were previously unidentified in previous work in whole
cis-regulatory elements in zebrafish to date and represents a major embryos15–20 (Extended Data Fig. 2a and Supplementary Table 6). As
resource for comparative genomics and the study of gene regulation expected, distal ATAC-seq peaks show higher tissue specificity than
in this vertebrate model organism. proximal peaks (Fig. 2b, c).
We then defined the cis-regulatory elements with the following com-
binations of histone modifications and ATAC-seq peaks: active pro-
Transcriptome analysis moter (H3K27ac, H3K4me3 and ATAC-seq), weak promoter (H3K4me3
We detected 39,188 transcripts across all tissues using RNA-seq, 14,764 and ATAC-seq), active enhancer (distal H3K27ac and ATAC-seq) and
of which exhibited tissue-specific patterns (Extended Data Fig. 1a–c). heterochromatin (H3K9me2 or H3K9me3 sites). Across all the tissues,
We identified 13,285 previously unknown transcripts, 8,311 of which we predicted 25,593 active promoters, 40,220 weak promoters, 58,065
were also supported by H3K4me3 peaks at the promoter regions active enhancers and 112,445 heterochromatin sites (Extended Data
(Fig. 1e, f, Extended Data Fig. 1d, Supplementary Table 2). These 8,311 Fig. 2b, c and Supplementary Tables 7–10). A total of 40.9% of the pre-
transcripts include 976 long noncoding RNAs (lncRNAs), 3,596 previ- dicted promoters and 62.5% of the predicted enhancers reported in
ously unknown isoforms and 3,739 potential previously uncharacter- this study were not identified in previous reports6,21–25 (Extended Data
ized protein-coding genes. Fig. 2a). Of the enhancers, 71.3% were tissue-specific and also showed
Next, we examined whether the expression patterns for the ortho- tissue-specific ATAC-seq signals (Fig. 2d, e). Gene ontology (GO) analy-
logues of tissue-specific genes were conserved between zebrafish and sis showed that they were located near genes important for relevant
human. Among the 14,764 tissue-specific zebrafish transcripts, 3,737 tissue-specific functions (Extended Data Fig. 2d).
have a one-to-one human orthologue, 1,747 of which (47%) also show To validate the predicted enhancers and their tissue specificities,
tissue-specific patterns in human (Fig. 1c, d, Supplementary Table 3), we used a GFP-based zebrafish embryo reporter assay. Of the 32
suggesting that these genes might have a critical and conserved role tissue-specific enhancers tested, 87.5% (28 out of 32) showed restricted
in the tissues in which they are uniquely expressed. GFP expression (Fig. 2f, Extended Data Fig. 3, Supplementary Table 11).
Accessibility, promoter and enhancer dynamics Single-cell ATAC-seq in zebrafish brain

Chromatin accessibility is associated with a wide range of regulatory We performed single-cell ATAC-seq (scATAC-seq) in adult zebrafish
elements13,14, thus we performed ATAC-seq in all 11 adult tissues. We brain, generating 654 million usable reads from 19,955 cells in the
identified 66,771–180,788 ATAC-seq peaks (Supplementary Table 4) adult brain. We identified 268,268 non-redundant peaks, covering

a b c No. of tissues
3′UTR Proximal ATAC-seq peaks Distal ATAC-seq peaks ≥5
5 4 2
ATAC-seq peaks (×105)


5′UTR 4
Number of proximal
Distal
Number of distal
Exon 3
2
Number of
Intron
Promoter 1
0 0 0
Br es
on
on
on
H le
H e
H le
L d
Li d
C ne
C ne
C ne
Sp ney
Sp ey
Sp ey
S s
is
is
d t
Ki art
Ki art
Bl n
Bl n
Te ain
Bl en
Te in
Te in
M kin
M in
M ki n
st r
te r
te r
In Liv d
Ki ear
In ve
te e
In ive
cl
i
oo
oo
e
e
c
c
oo
st
st
st
a
Sk
pl
ol
ol
ol
dn
dn
i
i
le
le
le
e
us
us
us
st
st
Br
Br
S
am
ls
dAl Clustering of enhancers e ATAC-seq signal f
Brain enhancer 5 5 dpf Heart enhancer 5 3 dpf
Intensity
Intensity
3 2
1 elavl3 gata6
0 chr3:48,934,231-48,935,968 chr2:4,311,945-4,313,944
–2 0
Muscle enhancer 7 5 dpf Kidney enhancer 1 5 dpf
musk zgc153722
chr10:13,126,525-13,128,100 chr23:16,855,310-16,856,895
n n s n d r n e t e n y
ai ai sti ee oo ive lo tin ar cl ki ne
Li d
Te ain
S le
te on
H ne
S p stis
us t
In ol r
B l en
ey
K i kin
M ear
Br -br Te pl Bl L Co tes He us SKid
C e
oo
c
v
dn
le
Br
st
E S In M
g h i j MA0077.1_SOX9
20 0.5
PCC = 0.878 Bulk vs scATAC-seq
40 18 16 8
Motif enrichment
0
Bulk ATAC-seq
24 9
20 1 15
25 21
6
t-SNE2
2,625 195,004 73,264 0 14 23

3 5 MA0678.1_OLIG2
22 17 19
−20 11 20 0.5
12 4 7
−40 2 13
0
Bulk Single-cell 10
8
8 scATAC-seq 20 –40 –20 0 20 40 60 0 25
t-SNE1 Clusters
Fig. 2 | Characterization of tissue-specific cis-regulatory elements. the 63 surviving embryos showed similar patterns. For kidney enhancer 1, 47
a, Number of ATAC-seq peaks predicted in each tissue and their genomic out of the 82 surviving embryos showed similar patterns. Scale bar, 200 μm.
distribution. b, c, Tissue specificity of proximal and distal ATAC-seq peaks in dpf, days post-fertilization. g, Pearson correlation coefficient between
11 adult tissues. d, Clustering analysis identified tissue-specific enhancers. aggregated signals of scATAC-seq and bulk ATAC-seq data. Values are the sums
Values in the heat map were input-normalized H3K27ac intensity (n = 58,226 of the reads in continuous 10-kb bins, normalized by sequencing depth. h, The
enhancers). e, Normalized ATAC-seq intensity in the corresponding enhancer overlap between peaks predicted in bulk and scATAC-seq data. i, t-distributed
elements shown in d. f, Examples of validated tissue-specific enhancers by GFP stochastic neighbour embedding (t-SNE) analysis identified 25 clusters in the
reporter assay in zebrafish embryos. For brain enhancer 5, 112 out of 143 scATAC-seq data in zebrafish adult brain (n = 19,955). j, Examples of enriched
surviving embryos showed similar patterns. For heart enhancer 7, 61 out of the motifs in different clusters from scATAC-seq peaks (n = 19,955).
67 surviving embryos showed similar patterns. For muscle enhancer 5, 53 out of
98.7% of the bulk ATAC-seq peaks in the brain (Fig. 2g, h, Extended Short interspersed nuclear elements (SINEs) were enriched in both
Data Fig. 4a–c). Among them, 73,264 peaks were detected only by H3K9me2 and H3K9m3 sites, whereas long terminal repeats (LTRs) were
scATAC-seq, suggesting that there are potentially more regulatory enriched only in H3K9me2 sites (Fig. 3b). Although both H3K9me2 and
elements in the zebrafish genome than we predicted on the basis of H3K9me3 sites were depleted of active marks within the same tissue,
the bulk tissue results. 20% of these sites overlapped with ATAC-seq peaks or other active
Using the scATAC-seq data, we identified 25 clusters of cells in the marks in other tissues (Fig. 3a, Extended Data Fig. 5e, g), suggesting
zebrafish brain (Fig. 2i). By identifying the key cell-type-specific tran- that heterochromatin regions in one tissue may be active regulatory
scription factor motifs, we inferred the potential cell type of each cluster, elements in other tissues.
such as oligodendrocyte progenitor cells and prefrontal cortex cells To study DNA methylation patterns in zebrafish, we performed
(Extended Data Fig. 4d, e). We quantitatively determined the enrich- WGBS in 11 adult tissues with approximately 30× coverage in each
ment of transcription factor motifs in each of the 25 clusters (Fig. 2j, dataset. Genome-wide CpG methylation levels were approximately
Extended Data Fig. 4f). Many neuronal transcription factors (such 80% across different tissues, with the exception of the testis, which
as SOX9 and OLIG2) were enriched in different clusters, suggesting exhibited higher CpG methylation levels (Extended Data Fig. 6a, b). We
potential roles in cell-type-specific regulation in the zebrafish brain. also detected increased levels of methylation at the CAC trinucleotide
in brain compared with other tissues (Extended Data Fig. 6c), similar
to reports in human and mouse26. Unmethylated CpGs were found
Heterochromatin and DNA methylation mostly in CpG islands, gene promoters and 5′ untranslated regions
We performed ChIP-seq for H3K9me2 and H3K9me3 in 11 adult (Extended Data Fig. 6d, e), whereas CpGs in gene bodies and differ-
zebrafish tissues (Extended Data Fig. 5a). Across all tissues, we identi- ent classes of repetitive elements were heavily methylated (Extended
fied 73,777 non-redundant H3K9me2 sites and 68,798 non-redundant Data Fig. 6e). We also identified unmethylated regions (UMRs) and
H3K9me3 sites. While both H3K9me2 and H3K9me3 are heterochro- low-level-methylated regions (LMRs) (Supplementary Table 12). Most
matic marks, they were located in different parts of the genome, with UMRs overlapped with candidate promoters and proximal ATAC-seq
overlap of about 10% in the same tissue (Extended Data Figs. 5b–d). peaks, whereas LMRs overlapped more with candidate enhancers and

Article
a b We therefore performed nanopore, 10X Genomics and BioNano opti-
Enhancer H3K9me3 H3K9me2
Brain Brain cal mapping in an individual female zebrafish of the Tübingen strain.
Log2(FC)
Blood Blood
Colon Colon By combining these long-read DNA-sequencing results with the Hi-C
Heart Heart
Intestine Intestine 2 data from brain, we de novo assembled a new version of chromosome
Kidney Kidney 1
Liver Liver
0
4 (Methods, Extended Data Fig. 7c, Supplementary Dataset 1). With the
Muscle Muscle
Skin
Spleen
Skin
Spleen
–1 newly assembled genome, we reprocessed the Hi-C data and observed
Testis Testis that most of the aberrant signals were no longer visible on the Hi-C map
Unknown
Unknown
0 Percentage100 0 Percentage 100
Satellite
Unknown
RC
RC
DNA
DNA
SINE
Satellite
SINE
LINE
LINE
LTR
LTR
Satellite
RC
DNA
LINE
LTR
SINE
(Fig. 3e, f). We reprocessed the BioNano optical-mapping data and also
Heterochromatin Active marks in Active marks in
same tissue other tissues observed fewer structural-variation events (Extended Data Fig. 7d, e).
c d
DNA
Examples of tissue-specific hypoDMRs
This newly assembled chromosome 4 will serve as a resource to study
methylation H3K27ac ATAC-seq
sex determination and other processes in zebrafish that involve genes
Brain dazl elavl4
H3K4me3 Testis on this chromosome.
H3K27ac
ATAC-seq
WGBS Conservation of cis-regulatory elements
RNA
Functional elements are often conserved during evolution27. We
–5 kb +5 kb
H e
chr19:20,815,000-20,820,000 chr8:15,964,500-15,984,000
first examined the sequence conservation of different classes of
rt
r
cl
ve
ea
Intensity
us
Li
0 1 0 5 0 5
M
e f cis-regulatory elements. Promoters had the highest degree of sequence

10 Mb
conservation, while enhancers had a much lower but still significant
GRCz11 chr4
level of sequence conservation (Fig. 4a, Extended Data Fig. 8b). Of

the predicted enhancers with sequences that were not conserved in
humans or mouse, 88.6% were conserved in other fish species (Fig. 4c,
= 17 = 16 = 17
GRCz10 GRCz11 De novo assembly De novo assembled chr4 Extended Data Fig. 8c). Furthermore, 60–90% of zebrafish enhancers
chr4: 0-76.6 Mb chr4: 0-78.1 Mb chr4: 0-74.5 Mb
with sequences that were conserved in human were also predicted as
Fig. 3 | Analysis of heterochromatin and repetitive elements and de novo candidate enhancer elements by NIH ENCODE and Roadmap Epigenet-
assembly of zebrafish chromosome 4. a, Comparison of H3K9me2 and ics Project (Fig. 4b, Extended Data Fig. 8a). Notably, even for zebrafish
H3K9me3 sites with active marks (ATAC-seq, H3K4me3 or H3K27ac peaks) from enhancers with no detectable sequence conservation in humans, we
the same tissue (left) and active marks in other tissues (right). The number of found that they were conserved in other fish species with increased fish
heterochromatin regions in each tissue: testis, 36,672; spleen, 20,813; skin, PhyloP scores (Fig. 4c), suggesting that they might be used as enhanc-
24,687; muscle, 25,692; liver, 29,117; kidney, 21,821; intestine, 24,072; heart, ers in other fish.
14,706; colon, 22,426; blood, 19,082; and brain, 21,596. b, Repetitive-elements Previous efforts have identified several thousand ultra-conserved
enrichment analysis for predicted enhancers, H3K9me3 and H3K9me2 sites.
noncoding elements (UCNEs) in vertebrates28. There are 2,405, 4,337
Colour and size indicate fold enrichment. c, DNA methylation levels, H3K27ac
and 4,351 UCNEs in zebrafish, mouse and human, respectively. We
ChIP-seq and ATAC-seq signals for tissue-specific hypoDMRs. Tissue-specific
found that 69% of the zebrafish UCNEs overlapped with the predicted
hypoDMRs cluster size: muscle, 1,912; heart, 1,708; and liver, 3,386. d, Examples
of tissue-specific hypoDMRs. Vertical scale: 0–300 for H3K4me3, 0–100 for
cis-regulatory elements: 15% as promoters, 53% as enhancers and 1% as
H3K27ac and ATAC-seq, 0–1 for WGBS, 0–40 for RNA-seq. ChIP-seq data are noncoding RNAs. One such example is shown in Fig. 4d: the zebrafish
normalized by input. ATAC-seq and RNA-seq are normalized by total UCNE SALL3_Anna is conserved in both human and mouse, and this
sequencing depths. For WGBS, scale is the methylation level. e, Brain Hi-C data element is predicted as an enhancer sequence in all three species on the
mapped to GRCz10, GRCz11 and the de novo assembled chromosome 4. basis of the enrichment of H3K27ac signals (another example is shown
Aberrant Hi-C signals were observed when Hi-C reads were mapped to the in Extended Data Fig. 8d). Furthermore, this enhancer has also been
GRCz10 or GRCz11 reference genome but were not visible when mapped to the validated by transgenic reporter assays in mouse embryos29. Overall,
de novo assembled chromosome 4. f, Alignment of the de novo assembled enhancers localized in the ultra-conserved regions were more likely
chromosome 4 to the GRCz11 reference genome (alignment of 2-kb bins using to be predicted as enhancers in mouse and human (30% versus 5.68%).
LASTZ).
Linking distal elements to target genes

distal ATAC-seq peaks (Extended Data Fig. 6f). We identified differen- To link distal ATAC-seq peaks to their target genes, we adopted a
tially methylated regions (DMRs) and tissue-specific hypomethylated correlation-based strategy30. We generated 340,527 ATAC-seq peak-
DMRs (hypoDMRs) (Extended Data Fig. 6g, Supplementary Table 13). to-gene links with false discovery rate (FDR) below 0.01, which
Tissue-specific hypoDMRs were enriched with tissue-specific H3K27ac contain 144,886 distal ATAC-seq peaks and 18,792 genes (Extended
and ATAC-seq signals (Fig. 3c, d), suggesting that they can potentially Data Fig. 9a). Using a similar strategy, we predicted 96,540 enhancer-
identify tissue-specific cis-regulatory elements. to-gene links, 37,241 of which were also supported by ATAC-seq
peak-to-gene links (Fig. 4f, Extended Data Fig. 9b, c, Supplemen-
tary Table 14). The enhancer-to-gene links contain 33,728 putative
De novo assembly of chromosome 4 enhancers and 16,935 genes. We observed higher Hi-C reads for
Chromosome 4 has been regarded as the rudimentary sex chromosome the predicted enhancer-to-gene pairs than expected values at each
in zebrafish12. However, extensive heterochromatin and transposable genomic distance (Fig. 4g, Extended Data Fig. 9d), supporting the
elements have made the analysis of this chromosome challenging. linkages between genes and distal elements on the basis of activity
Indeed, we observed strong enrichment of H3K9me3, H3K9me2 and correlation.
DNA methylation on the long arm of chromosome 4 (Extended Data
Fig. 7a, b). When we examined the Hi-C data (Fig. 3e) mapped to the
GRCz10 and GRCz11 genome, there were many aberrant off-diagonal Tissue-specific transcriptional regulatory network
signals, indicating the presence of structural variations between the To identify putative key transcription factors in each tissue, we
reference genomes and the genome of the fish used in this study, owing performed motif analysis in each group of tissue-specific enhanc-
to either assembly error or inter-individual variations. ers and identified a set of motifs that were enriched in the same

a b Enhancer in other tissues c Enhancers not conserved
Sequence conservation Enhancer in same tissue in human Higher-order genome structure in zebrafish
in human Not an enhancer in human Random sequence
100
100
0.10
To study higher-order chromatin structure in zebrafish, we gener-
Percentage (%)
Percentage (%)
Fish phyloP
ated high-resolution Hi-C data (10 kb) in the adult brain and muscle
(Extended Data Fig. 10a), with about 2.1 and 1.4 billion paired-end
reads, respectively. Replicates of the Hi-C experiments were highly
0 0
r r
on ote ce om
n t e y
ai od lon ar in ne
e
cl kin en
0.05
–10 kb Enhancer +10 kb reproducible32. We predicted the A/B compartments and found that
r
Br Blo Co He testKid us S ple
ve
Ex om han and
Li
Pr En R In M S their genomic coverages in these two tissues were similar. H3K27ac,
d e Human
H3K4me3 and ATAC-seq signals were enriched in the A compartment,
UCNE: SALL3_Anna Zebrafish
20
RFX
OLIG2
whereas H3K9me2 and H3K9me3 signals were enriched in the B com-
Zebrafish
H3K27ac
NEUROD partment (Extended Data Fig. 10b). We identified 5,348 regions with
Brain
20 H3K4me3 ATOH1
chr19:22,457,896-22,461,885 TLX
ETV
switched A/B compartments between the two tissues, and these regions
PU.1
were associated with altered gene expression and H3K27ac signals
Mouse
EHF
Brain
GATA2
chr18:81,347,542-81,351,549 GATA6 (Extended Data Fig. 10c). We predicted 1,350 topologically associating
ETS1
FOXA2 domains (TADs) in the brain and 1,238 TADs in the muscle (Extended
Human
Brain
BMAL1
chr18:76,480,477-76,485,273
HNF4A Data Fig. 10d, Supplementary Table 15). Most of the TADs were shared
HNF1
CDX2 between the two tissues (Extended Data Fig. 10b, e) and TAD boundaries
phyloP
4
ERRA
-4 GATA4
GATA3
were enriched for CTCF-binding sites, SINEs and satellite elements
P-value
NUR77
MEF2D (Extended Data Fig. 10f–h).
MYOD
SIX1 20 We identified 7,708 and 5,312 chromatin loops in the adult brain and
P53
ETV1 0 muscle, respectively (Fig. 5a, Supplementary Table 16). The major-
Blood
Colon
Muscle
Blood
Muscle
ity of the loop anchors had convergent CTCF-binding motifs (72% in

Colon
Intestine
Intestine
Kidney
Kidney
Heart
Heart
Liver
Brain
Brain
Skin
Spleen
Spleen
Liver
Skin
Mouse reporter
assay by VISTA
muscle and 63% in brain). Of the predicted loops in the brain, 98.6%
f g
5 12
Enhancer-to-gene pairs
were between regions that contain either at least one promoter or one
Mean = 4.85 Mean = 2.67
95% Cl 9
To 1 gene = 34%
100
Expected enhancer, and 91.6% of enhancer–promoter loops overlapped with
Frequency (×103)
predicted enhancer–promoter or distal ATAC-seq peak–promoter

Hi–C interaction
6
linkage pairs (Fig. 5b). We performed motif analysis to identify the
3 transcription factors that may have a role in forming the chromatin
0
loops. CTCF and BORIS were enriched in shared loops (Fig. 5a), and
0 0
0 10 20 30 40 0 10 20 30 40 0 500 tissue-specific transcription factors were enriched in tissue-specific
Number of enhancers Number of genes Genomic distance (kb)
linked per gene linked per enhancer chromatin interactions (Fig. 5a). For example, RFX and NeuroD2 were
enriched in brain-specific loops, whereas two muscle-specific master
Fig. 4 | Conservation of zebrafish cis-regulatory elements and
regulators, Myf5 and Ascl1, were enriched in muscle-specific loops
transcriptional networks. a, Percentage of zebrafish exons and cis-regulatory
elements that have orthologous sequences in human. Total number for each
(Fig. 5c).
bar: exon, 1,000; promoter, 25,593; enhancer, 58,065; and random, 1,000. For
exons and random, we randomly sampled 1,000 elements and computed the
percentage conservation. The simulations were performed 20 times and the Zebrafish genome evolution and TADs
mean percentage is shown. b, Percentage of human orthologous sequences of TADs have been shown to be conserved among different species33–36.
zebrafish enhancers that were predicted as enhancers in human tissues. Total To investigate the relationship between TADs and zebrafish genome
number for each bar: brain, 1,241; blood, 748; colon, 775; heart, 839; intestine, evolution, we first identified three sets of zebrafish evolutionary break-
564; kidney, 173; liver, 402; muscle, 591; skin, 356; and spleen, 1,000. c, Fish points by aligning its genome against chicken, mouse and human,
PhyloP score for the zebrafish enhancers with sequences that were not respectively. We then compared the breakpoints with TAD annota-
conserved in human (number of enhancers in red line is 51,446, blue line is tions in zebrafish and observed that 80.5% of breakpoints (984 of 1,223
50,000). d, An ultra-conserved noncoding element predicted as a brain
zebrafish-to-human breakpoints) were located near TAD boundaries,
enhancer in zebrafish, mouse and human. This enhancer element has been
but depleted towards the centre of TADs (Fig. 5d, Extended Data
validated by transgenic reporter assay in mouse (hs1056 in the VISTA Enhancer
Fig. 11a). We divided TADs into two groups: TADs containing a break-
Browser). e, Heat map showing transcription factor motif enrichment in
tissue-specific enhancers in zebrafish and human. f, Linking distal enhancers to
point and TADs not containing a breakpoint. TADs without breakpoints
their target genes by correlation of tissue-specific activity. Left, distribution of had stronger interaction frequencies in the middle than TADs with
the predicted number of enhancers per gene. Right, distribution of predicted breakpoints (Fig. 5e, Extended Data Fig. 12d). Further, the expression
number of genes per enhancer. g, Validation of the predicted enhancer-to-gene patterns of genes across different tissues in the TADs without break-
pairs by Hi-C interaction counts in brain. points were more correlated with those of their homologues in human
(Fig. 5f) than the other group, suggesting that there is an association
between TAD stability and conservation of the expression pattern. This
tissues in zebrafish and human (Fig. 4e). To further probe the simi- may be caused by strong chromatin interactions that may contribute to
larities in transcription factor connections between zebrafish and TAD stability during evolution, or breaking of TADs with strong interac-
human, we performed the three-node network analysis as previously tions that are selected against in evolution, as these interactions and
described31 (Methods). We observed that CTCF was predicted as the genes involved in the interactions may be physiologically important
the driver node in most tissues, whereas tissue-specific transcrip- for zebrafish.
tion factors such as NEUROD and MYOD were predicted as middle Next, we divided zebrafish TAD boundaries into two classes, bounda-
and passenger nodes in the networks of brain and muscle tissue, ries overlapping with breakpoints and boundaries not overlapping with
respectively (Extended Data Fig. 9e, f). The overall patterns of the breakpoints. We observed higher H3K4me3 signals at TAD boundaries,
three-node networks were highly similar between zebrafish and as previously described36. We found a much higher level of H3K4me3 at
human (Extended Data Fig. 9f), further demonstrating the value of TAD boundaries with breakpoints (Fig. 5g, Extended Data Fig. 11b, c).
zebrafish as a system to study human transcription factor regula- We also confirmed similar higher H3K4me3 enrichment in human
tory circuits. or mouse TAD boundaries with evolutionary breakpoints (Extended

Article
a b Supported by
Hi-C loops linkage prediction
Brain Muscle Motif –log (P) Enhancer
24.5% 24.74%
n = 4,610 ATAC -seq
CTCF 1106 E–E 7%
8.3 Both
specific
39.73%
Brain
RFX2 106 32.53% None
34.37% E–P
NeuroD2 105 P–P
34.37%
n = 3,098 CTCF 568

d
Shared
BORIS 437 Zebrafish vs mouse Zebrafish vs human
200 200
breakpoints (BPs)
49
X-box
Number of
n = 2,219
specific CTCF 79
Muscle
Ascl1 29
Myf5 24 50 50
–500 kb +500 kb –500 kb +500 kb
TAD TAD
c
Shared loops Brain-specific loops Muscle-specific loop
akt3a zbtb18 traip
20 H3K27ac
Muscle Brain
5 RNA-seq
20 H3K27ac
5 RNA-seq
Brain Brain Brain
= 35 = 20 = 15
Muscle Muscle Muscle
chr13:13,000,000-17,500,000 chr13:10,000,000-11,800,000 chr11:33,750,000-35,600,000
e f h 3.0
Pol2
With CTCF
TADs without BPs With Without
ChIP-seq
BPs CTCF
signal
**
Without
BPs
0 1
Correlation 0.5
–1 Mb Center +1 Mb
GRO-seq
TADs with BPs
g 0.14
8 With BPs
Without BPs
GRO-seq
H3K4me3
signal
+1 Mb 0.02
–1 Mb Center 0
–500 kb +500 kb –200 kb Breakpoint +200 kb
0 1 Boundary
Fig. 5 | Higher-order chromatin structure and zebrafish genome evolution. breakpoints (bottom). f, Expression pattern of genes in TADs without an
a, Aggregate peak analysis plot and motif analysis of tissue-specific or shared evolutionary breakpoint is more highly conserved than genes in TADs that
chromatin loops. In each panel, n is the number of loops in that group. b, Left, contain a breakpoint. For each gene, we collected its expression profile across
annotation of cis elements in the predicted loop anchors in brain with a total of the same ten tissues in both zebrafish and human, and computed a Spearman
7,710 loops in the pie chart. Right, comparison of promoter–enhancer correlation coefficient between the profiles for each gene. Number of gene
chromatin loops with correlation-based linkage between ATAC-seq or histone pairs: without BPs, 4,625; with BPs, 3,918; P = 3.56 × 10 −26, two-sided Mann–
modification-based enhancer–gene pairs with a total of 4,996 loops in the pie Whitney U test. g, H3K4me3 signals were higher in TAD boundaries with
chart. c, Examples of shared, brain-specific and muscle-specific chromatin breakpoints than TAD boundaries without breakpoints. h, Higher
loops. d, Relative position of evolutionary breakpoints to TADs. Breakpoints transcriptional activities at TAD boundaries with breakpoints and containing
were between zebrafish and mouse (left) or between zebrafish and human CTCF binding sites in human GM12878 cells. Number of breakpoints: with CTCF,
(right). In all cases, we found that the evolutionary breakpoints were enriched 639; without CTCF, 625. K562 cell data is shown in Extended Data Fig. 12c. Results
at zebrafish TAD boundaries and depleted from the centre of TADs. e, TADs from 17 additional vertebrates are shown in Extended Data Figs. 11a, b, 12d.
without breakpoints (top) have stronger internal interactions than TADs with
Data Fig. 11d–f). As a control, we observe similar amounts of H3K27ac We observed that there were much higher transcription activities at
or ATAC-seq enrichment between the two groups of TAD boundaries breakpoints overlapping with CTCF-containing TAD boundaries, com-
(Extended Data Fig. 12a, b). Notably, an earlier report showed H3K4me3 pared with those without CTCF TAD boundaries (Fig. 5h, Extended
signal enrichment at recombination hotspots in mouse37, and our find- Data Fig. 12c).
ing in zebrafish further suggests its potential association with genome A key feature of the zebrafish genome is an extra genome-duplication
stability and evolution. event compared with other vertebrates41. There were 2,456 paralogous
Previous work has suggested a link between transcription at gene pairs annotated in Ensemble and the paralogues show similar
CTCF-containing TAD boundaries and their potential role in trans- expression patterns across all different tissues, with a median Pearson
locations38–40. Therefore, we investigated the transcriptional status correlation of 0.458 (Extended Data Fig. 12e). We analysed paralogue
at the evolutionary breakpoints at TAD boundaries using the Pol2, pairs located on the same chromosome and observed that paralogues
CTCF ChIP-seq and GRO-seq data in GM12878 and K562 human cells. located in the same TADs have a higher correlation in gene expression

patterns than paralogues located in different TADs (Extended Data 17. Liu, G., Wang, W., Hu, S., Wang, X. & Zhang, Y. Inherited DNA methylation primes the
establishment of accessible chromatin during genome activation. Genome Res. 28,
Fig. 12f). 998–1007 (2018).
In summary, we report a comprehensive annotation of the zebrafish 18. Marlétaz, F. et al. Amphioxus functional genomics and the origins of vertebrate gene
genome, and we described both conserved and divergent gene- regulation. Nature 564, 64–70 (2018).
19. Meier, M. et al. Cohesin facilitates zygotic genome activation in zebrafish. Development
regulatory networks and 3D genome structures between zebrafish and 145, dev156521 (2018).
human. The breadth and depth of the data establish a genomic founda- 20. Torbey, P. et al. Cooperation, cis-interactions, versatility and evolutionary plasticity of
tion for conducting further human disease modelling and biological multiple cis-acting elements underlie krox20 hindbrain regulation. PLoS Genet. 14,
e1007581 (2018).
studies in zebrafish. 21. Paik, E. J. et al. A Cdx4–Sall4 regulatory module controls the transition from mesoderm
formation to embryonic hematopoiesis. Stem Cell Reports 1, 425–436 (2013).
22. Kang, J. et al. Modulation of tissue repair by regeneration enhancer elements. Nature 532,
Online content 201–206 (2016).
23. Kaufman, C. K. et al. A zebrafish melanoma model reveals emergence of neural crest
Any methods, additional references, Nature Research reporting sum- identity during melanoma initiation. Science 351, aad2197 (2016).
maries, source data, extended data, supplementary information, 24. Goldman, J. A. et al. Resolving heart regeneration by replacement histone profiling.
Dev. Cell 40, 392–404 (2017).
acknowledgements, peer review information; details of author con- 25. Pérez-Rico, Y. A. et al. Comparative analyses of super-enhancers reveal conserved
tributions and competing interests; and statements of data and code elements in vertebrate genomes. Genome Res. 27, 259–268 (2017).
availability are available at https://doi.org/10.1038/s41586-020-2962-9. 26. Lister, R. et al. Global epigenomic reconfiguration during mammalian brain development.
Science 341, 1237905 (2013).
27. Visel, A. et al. Ultraconservation identifies a small subset of extremely constrained
1. Howe, K. et al. The zebrafish reference genome sequence and its relationship to the developmental enhancers. Nat. Genet. 40, 158–160 (2008).
human genome. Nature 496, 498–503 (2013). 28. Dimitrieva, S. & Bucher, P. UCNEbase–a database of ultraconserved non-coding elements
2. Gerhard, G. S. et al. Life spans and senescent phenotypes in two strains of Zebrafish and genomic regulatory blocks. Nucleic Acids Res. 41, D101–D109 (2013).
(Danio rerio). Exp. Gerontol. 37, 1055–1068 (2002). 29. Visel, A., Minovitsky, S., Dubchak, I. & Pennacchio, L. A. VISTA Enhancer Browser—a
3. Lamason, R. L. et al. SLC24A5, a putative cation exchanger, affects pigmentation in database of tissue-specific human enhancers. Nucleic Acids Res. 35, D88–D92 (2007).
zebrafish and humans. Science 310, 1782–1786 (2005). 30. Corces, M. R. et al. The chromatin accessibility landscape of primary human cancers.
4. Vastenhouw, N. L. et al. Chromatin signature of embryonic pluripotency is established Science 362, eaav1898 (2018).
during genome activation. Nature 464, 922–926 (2010). 31. Neph, S. et al. Circuitry and dynamics of human transcription factor regulatory networks.
5. Bogdanovic, O. et al. Dynamics of enhancer chromatin signatures mark the transition Cell 150, 1274–1286 (2012).
from pluripotency to cell specification during embryogenesis. Genome Res. 22, 32. Yang, T. et al. HiCRep: assessing the reproducibility of Hi-C data using a stratum-adjusted
2043–2053 (2012). correlation coefficient. Genome Res. 27, 1939–1949 (2017).
6. Kaaij, L. J. et al. Enhancers reside in a unique epigenetic environment during early 33. Krefting, J., Andrade-Navarro, M. A. & Ibn-Salem, J. Evolutionary stability of topologically
zebrafish development. Genome Biol. 17, 146 (2016). associating domains is associated with conserved gene regulation. BMC Biol. 16, 87
7. Aday, A. W., Zhu, L. J., Lakshmanan, A., Wang, J. & Lawson, N. D. Identification of cis (2018).
regulatory features in the embryonic zebrafish genome through large-scale profiling of 34. Lazar, N. H. et al. Epigenetic maintenance of topological domains in the highly
H3K4me1 and H3K4me3 binding sites. Dev. Biol. 357, 450–462 (2011). rearranged gibbon genome. Genome Res. 28, 983–997 (2018).
8. Vesterlund, L., Jiao, H., Unneberg, P., Hovatta, O. & Kere, J. The zebrafish transcriptome 35. Fishman, V. et al. 3D organization of chicken genome demonstrates evolutionary
during early development. BMC Dev. Biol. 11, 30 (2011). conservation of topologically associated domains and highlights unique architecture of
9. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J. Transposition of erythrocytes’ chromatin. Nucleic Acids Res. 47, 648–665 (2019).
native chromatin for fast and sensitive epigenomic profiling of open chromatin, 36. Dixon, J. R. et al. Topological domains in mammalian genomes identified by analysis of
DNA-binding proteins and nucleosome position. Nat. Methods 10, 1213–1218 (2013). chromatin interactions. Nature 485, 376–380 (2012).
10. ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human 37. Smagulova, F. et al. Genome-wide analysis reveals novel molecular features of mouse
genome. Nature 489, 57–74 (2012). recombination hotspots. Nature 472, 375–378 (2011).
11. Yue, F. et al. A comparative encyclopedia of DNA elements in the mouse genome. Nature 38. Canela, A. et al. Genome organization drives chromosome fragility. Cell 170, 507–521
515, 355–364 (2014). (2017).
12. Anderson, J. L. et al. Multiple sex-associated regions and a putative sex chromosome in 39. Gothe, H. J. et al. Spatial chromosome folding and active transcription drive DNA fragility
zebrafish revealed by RAD mapping and population genomics. PLoS ONE 7, e40701 and formation of oncogenic MLL translocations. Mol. Cell 75, 267–283 (2019).
(2012). 40. Canela, A. et al. Topoisomerase II–induced chromosome breakage and translocation is
13. Klemm, S. L., Shipony, Z. & Greenleaf, W. J. Chromatin accessibility and the regulatory determined by chromosome architecture and transcriptional activity. Mol. Cell 75,
epigenome. Nat. Rev. Genet. 20, 207–220 (2019). 252–266 (2019).
14. Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. 41. Postlethwait, J. H. et al. Vertebrate genome evolution and the zebrafish gene map.
Nature 584, 244–251 (2020). Nat. Genet. 18, 345–349 (1998).
15. Quillien, A. et al. Robust identification of developmentally active endothelial enhancers
in zebrafish using FANS-assisted ATAC-seq. Cell Rep. 20, 709–720 (2017). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
16. Letelier, J. et al. Evolutionary emergence of the rac3b/rfng/sgca regulatory cluster published maps and institutional affiliations.
refined mechanisms for hindbrain boundaries formation. Proc. Natl Acad. Sci. USA 115,
E3731–E3740 (2018). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods remove the yolk. The supernatant was removed, and 500 μl of 0.25%
trypsin was added to digest embryos at room temperature for 20 min.
No statistical methods were used to predetermine sample size. The After neutralization with 500 μl of FBS and washed once with cold
experiments were not randomized. The investigators were not blinded 1× PBS, digested cells were then sorted, and green fluorescent cells,
to allocation during experiments and outcome assessment. which corresponded to embryonic neuronal cells, were collected and
cross-linked with 1% formaldehyde at room temperature for 12 min.
Adult tissue ChIP-seq
One-year-old adult Tübingen zebrafish (sex information shown in RNA-seq
Supplementary Table 17) were dissected to separate tissues that were For each RNA-seq experiment, tissues from at least two Tübingen fish
washed twice in 1× PBS buffer before flash freezing on dry ice. Collec- were combined to use as one replicate. For embryonic trunk, ten 1-dpf
tion of peripheral blood from adult fish was performed as described42. fish were dechorionated with pronase, and the trunk was separated
All procedures on live animals have been approved by the Institutional for RNA-seq. For embryonic neurons, green cells from Tg(Huc:Kaede)
Animal Care and Use Committee (IACUC) at the Pennsylvania State fish were sorted by FACS, and approximately 20,000 cells were used
University (PRAMS201445659). At the beginning of ChIP-seq, all tissues for one replicate. The tissue RNA was extracted from Trizol according
(except peripheral blood) were ground in liquid nitrogen and fixed by 1% to the manufacturer’s protocol (Invitrogen). The cDNA libraries were
formaldehyde at room temperature for 15 min. 2.5 M glycine was added constructed using SureSelect Strand Specific RNA Library Prepara-
at a final concentration of 0.2 M and incubated at room temperature tion Kit (Agilent) according to the manufacturer’s protocol. In brief,
for 5 min to quench the fixation. Fixed tissues were then washed once polyA RNA was purified from 1,000 ng of total RNA using oligo (dT)
with cold 1× PBS. Tissue pellets were resuspended and incubated on ice beads (Invitrogen). Extracted RNA was first fragmented, then followed
for 10 min in 100 μl of ChIP-seq lysis buffer (20 mM Tris-HCl, pH 8.0, 1% by reverse transcription, end-repair, adenylation, adaptor ligation,
SDS, 50 mM EDTA, 1× proteinase inhibitor cocktail). Next, 900 μl cold and subsequent PCR amplification. The final product was checked
1× TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) was added to dilute by size distribution and concentration using a BioAnalyzer High Sen-
the SDS, and the nuclei suspension was sonicated using a Covaris E220 sitivity DNA Kit (Agilent) and Kapa Library Quantification Kit (Kapa
with the following parameters: 140 W, duty factor 5, 200 per burst. To Biosystems).
check the chromatin fragmentation size, 20 μl of input chromatin was
reverse cross-linked in elution buffer (20 mM Tris-HCl, pH 8.0, 1% SDS, ATAC-seq
1 mM EDTA) at 65 °C overnight, treated with RNase A and proteinase K Tübingen adult tissues were freshly dissected and processed imme-
and purified by phenol–chloroform extraction. Input DNA was then diately for ATAC-seq. In brief, the tissues were resuspended in 1 ml of
loaded on a Lonza flash gel to ensure that the majority of DNA was lysis buffer (1× PBS, 0.2% NP-40, 5% BSA, 1 mM DTT, protease inhibi-
between 100–300 bp. To prepare the antibody–beads complex, 3 μg tors), followed by Dounce homogenization with a loose pestle using
of histone H3K27ac antibody (Active Motif, 39133), H3K4me3 antibody 20 strokes. The lysate was then filtered through a 40-μm cell strainer,
(EMD Millipore, 07-473), H3K9me2 antibody (Cell Signaling, 4658) or and nuclei were collected at 500g for 5 min. Tagmentation was
H3K9me3 antibody (Abcam, ab8898) was mixed with 12 μl M-280 sheep performed immediately according to the previously reported
anti-rabbit (ThermoFisher, 11203D) or sheep anti-mouse IgG Dynabeads ATAC-seq protocol9.
(ThermoFisher, 11201D) in 150 μl of 5 mg ml−1 BSA in 1× PBS buffer, with
rotation at 4 °C for 3 h. After incubation, the antibody–beads complexes Whole-genome bisulfite sequencing
were washed once with BSA in 1× PBS buffer. About 200 μg chromatin Tübingen adult tissues were dissected, and the genomic DNA was
was used per immunoprecipitation. An equal volume of master mix extracted using DNeasy Blood & Tissue Kit (Qiagen, 69504). Then, 1 μg
(1× TE, 2% Triton X-100, 0.2% sodium deoxycholate, 2× proteinase inhibi- of genomic DNA of each tissue was subjected to bisulfite conversion
tor cocktail) was mixed with 200 μg chromatin and then incubated with using EZ DNA Methylation-Gold Kit. The final libraries were prepared
antibody–beads complexes overnight with rotation. The next morn- using the Accel-NGS Methyl-Seq DNA Library Kit.
ing, the beads were washed 5 times with cold RIPA wash buffer (20 mM
Tris-HCl, pH 8.0, 1% NP-40, 0.7% sodium deoxycholate, 500 mM LiCl, BioNano optical mapping
1 mM EDTA, 1× proteinase inhibitor cocktail). Then the bead-bound Genomic DNA from one Tübingen female muscle was extracted for
chromatin was eluted using 150 μl of elution buffer at 65 °C for 30 min. BioNano optical mapping. DNA extraction was done according to the
To prepare the library, eluted chromatin was reverse cross-linked and BioNano Prep Animal Tissue DNA Isolation Fibrous Tissue Protocol-
purified by phenol-chloroform extraction. Then DNA was end-repaired 30071. The homogenized genomic DNA was then directly labelled
with the END-IT DNA end-repair kit (Epicentre, ER81050) according to using DLE enzyme (BioNano, 80005) according to the BioNano Prep
the kit protocol, adenylated using Klenow fragment (3′→5′ exo-) (NEB, Direct Label and Stain (DLS)-30206 protocol. The labelled and stained
M0212S), ligated with Illumina TruSeq adaptor (Illumina, FC-121-3001) genomic DNA was then loaded onto a Saphyr chip, and 407 Gb of data
and subsequently amplified by PCR (Roche, kk2601). The quality and were collected for genome assembly.
quantity of all the libraries were checked using a BioAnalyzer High
Sensitivity DNA Kit (Agilent). scATAC-seq
scATAC-seq was performed using one female brain and one male brain
Embryonic tissue ChIP-seq on the 10X Genomics platform. To isolate nuclei, a freshly dissected sin-
Zebrafish embryonic tissue ChIP-seq was performed similarly to the gle brain was transferred to 1 ml NbActiv1 medium (BrainBits, NbActiv1
adult tissue with a few modifications. For embryonic trunk, 50 trunks of 500) with a wide-bore pipette tip to break the tissue into small pieces.
1-dpf embryos were dissected and digested in 500 μl of 0.25% trypsin at Then the tissue was further fragmented using regular-bore pipette
room temperature for 20 min. The reaction was then neutralized with tips followed by filtering through a 30-μm cell strainer. The isolated
FBS. Trunk cells were washed once with cold 1× PBS and cross-linked cells were spun down at 500g for 5 min at 4 °C and then lysed in 100 μl
with 1% formaldehyde at room temperature for 12 min. For embry- chilled 0.1× lysis buffer (10 mM Tris HCl, pH 7.5, 10 mM NaCl, 3 mM MgCl2,
onic neuronal cells, Tg (HuC:Kaede) transgenic fish were crossed with 1% BSA,0.01% Tween-20, 0.01% NP40 and 0.001% digitonin) on ice for
wild-type Tübingen fish. At 1 dpf, embryos were checked under the 5 min. Then, 1 ml chilled wash buffer (10 mM Tris HCl, pH 7.5, 10 mM
fluorescence microscope, and green positive embryos were selected, NaCl, 3 mM MgCl2, 1% BSA, 0.1% Tween-20) was added to the lysed cells
dechorionated, and pipetted in calcium-free Ringer’s solution to and cells were spun down at 500g for 5 min at 4 °C. Finally, 300 μl chilled
diluted nuclei buffer (10X Genomics, 2000153/2000207) was added to score greater than 0.38 were considered as novel protein-coding gene
resuspend the nuclei. The nuclei were filtered again using a 30-μm cell candidates. TransDecoder (https://transdecoder.github.io/) was then
strainer before cell counting. Around 12,000 nuclei were used for one used to predict open reading frames of these novel protein-coding gene
Tn5 tagmentation reaction, and the scATAC-seq library was prepared candidates. The predicted coding sequences were blasted with BLASTp
and sequenced according to the 10X Genomics user guide. against known protein databases of other species (Xenopus, cattle, pig,
chicken, mouse and human) to identify homologues in these species
Zebrafish tissue Hi-C (cut-off: 10−3). The GTF files for the predicted lncRNAs and the novel
Hi-C experiments on adult zebrafish brain and muscle tissues were per- transcripts were integrated with reference GTF files for further analysis.
formed according to a previously published protocol43 with a few modi-
fications. For brain replicate 1, two adult Tübingen zebrafish brains were Quantification for RNA-seq signals and identification of
gently ground into small granules in liquid nitrogen and resuspended tissue-specific genes
in 1 ml cold hypotonic buffer (20 mM Tris-HCl, pH 8.0, 10 mM NaCl, RNA-seq reads were trimmed using TrimGalore (https://github.com/
20 mM EDTA). For brain replicate 2, only one female Tübingen brain FelixKrueger/TrimGalore). We aligned trimmed RNA-seq reads to the
was used. Granular brains were then Dounce homogenized with a loose zebrafish genome (GRCz10) and generated bigwig files using STAR48.
pestle with 20 strokes. The upper layer of the homogenate was carefully QC and expression values for the RNA-seq libraries were computed
transferred into a new tube and fixed with 2% formaldehyde at room with the RSEM package49 and the previously generated GTF files were
temperature for 10 min. 0.2 M of glycine was added to stop fixation. used for the genome annotation. To correct the batch effect across all
For muscle tissue, 60 mg of Tübingen muscle was first chopped into libraries prepared at different times, we used the limma package in R on
small pieces and digested with 0.25% trypsin at room temperature for the log2(TPM + 1) matrix, and genes with TPM <1 in two biological repli-
30 min. After neutralization with FBS, muscle cells were resuspended cates in all tissues were further filtered out. Pearson correlation coef-
in cold 1× PBS and fixed with 2% of formaldehyde at room temperature ficients were then computed between biological replicates using the
for 10 min. Glycine (0.2 M) was then added to stop fixation. Two muscle read counts of 10 kb-binned matrices. The average TPM value between
Hi-C experiments were performed using tissues from female and male two biological replicates was used to represent the gene expression
Tübingen fish, respectively. value of each tissue to identify the tissue-specific expressed genes.
TPM values for the same gene in different tissues were transformed
Enhancer reporter assays into Z-scores to identify the tissue-specific genes at a threshold of Z > 2
Selected potential tissue-specific enhancers were evaluated using a and a minimum of threefold change.
Tol2 transposon-mediated zebrafish transgenesis approach as pre-
viously described44. Selected enhancers were PCR amplified from Comparison of orthologous genes between zebrafish and
Tübingen genomic DNA and subcloned to the upstream of the hsp70 human
promoter-eGFP cassette in the pT2HE vector (Supplementary Table 18 To compare the orthologue gene expression pattern between zebrafish
for primers). These enhancer reporter plasmids were subsequently and human, we downloaded the TPM matrix from the GTEx dataset from
co-injected with transposase mRNA in one-cell-stage Tübingen the same tissue (except testis and embryonic tissues), and the median
wild-type zebrafish embryos. eGFP expression patterns were moni- TPM value of replicates was used to represent the gene expression value
tored by Zeiss SteREO DiscoveryV8 microscope by AxioVison Rel.4.8 of each tissue. The one-to-one orthologous genes detected by BioMart
software, at four different developmental time points: 24–36 hours were selected for the comparison.
post-fertilization, 2 dpf, 3 dpf and 5 dpf. Expression patterns were
recorded only if eGFP signal was consistently displayed by at least ATAC-seq data processing
30–40% of total embryos (approximately 100 embryos were analysed ATAC-seq library adapters were detected and trimmed using detect_
per enhancer). adapter.py (https://github.com/ENCODE-DCC/atac-seq pipeline/blob/
master/src/detect_adapter.py) and cutadapt50. The trimmed reads were
Novel transcript identification aligned using bowtie251 with the following parameters: -X2000 –mm.
All the RNA-seq reads were mapped to the genome GRCz10 PCR duplication was removed using Picard (http://broadinstitute.
using Tophat245, with ‘–mate-inner-dis 50–mate-std-dev 1000– github.io/picard/). We calculated the Pearson correlation coef-
read-mismatches 2–read-edit-dist 2’ parameters excluding ‘-G’, then ficient between two biological replicates using the reads counts of
assembled using Cufflinks46. The cuffmerge function was applied to 10 kb-binned matrices and merged paired files with high correlation.
combine cufflinks assemblies from each library. The cuffcompare func- Then we process the ATAC-seq data using parameters recom-
tion was then used to analyse assembled transcripts of all libraries mended by ENCODE-DCC (https://github.com/ENCODE-DCC/
and to merge assembled transcripts with ENSEMBL and RefSeq refer- atac-seq-pipeline). Peaks were filtered to keep those with P <0.00001
ence annotations (ftp://ftp.ncbi.nlm.nih.gov/genomes/Danio_rerio/ and q <0.01 and peak length were fixed to 500 bp. We only selected
ARCHIVE/ ANNOTATION_RELEASE.105/Gnomon/ and ftp://ftp.ensembl. the peak with the most significant signal if several peaks overlapped in
org/pub/release-91/gtf/danio_rerio/). Novel transcripts were defined as each tissue. Then we separated the proximal ATAC-seq peaks (average
those with a u, x, j or i class code and presenting in both biological rep- n = 23,233), which were defined as overlapping with any (transcription
licates. Class code u represents no overlap with any known transcript; start site) TSS regions (2.5 kb upstream to 500 bp downstream of TSS),
x represents exonic overlap with known transcripts on the opposite and distal ATAC-seq peaks (average n = 122,979).
strand; j represents potential novel isoforms with at least one splice Chromatin accessibility can reveal a full range of regulatory ele-
junction is shared with a known transcript; and i represents a transcript ments, including both active and poised enhancers and promoters.
falling entirely within a reference intron. To generate the zebrafish open chromatin landscape, we merged the
ATAC-seq peaks across 11 tissues. The peaks with the most significant
lncRNA identification and novel transcript annotation signals were selected as representative open chromatin regions, and
To identify lncRNAs from novel transcripts, we first selected novel all the peaks overlapping with representative open chromatin regions
transcripts with more than one exon and longer than 150 bp. The coding of at least one base-pair were removed. In total, we identified 436,035
potential of these transcripts was predicted using CPAT with a zebrafish representative open chromatin regions in the zebrafish genome. We
model47. The transcripts with a coding potential score less than 0.38 annotated the ATAC-seq peaks using the ChIPseeker package. Footprint
were identified as lncRNAs. The transcripts with a coding potential was identified by the HINT software v.0.13.0 (Hmm-based IdeNtification
Article
of Transcription factor footprints)52 based on ATAC-seq data. In brief, overlapped with representative open chromatin region but without
ATAC-seq narrowpeaks were used as input, the footprint region were H3K27ac peaks. Active promoters were defined by H3K4me3 peaks that
filtered by footprint score >10, transcription factor motifs overlap with overlapped with representative open chromatin region and H3K27ac
footprints was identified using the MOODS package v.1.9.453 (https:// peaks. Active enhancers were defined as H3K27ac peaks not overlap-
github.com/jhkorhonen/MOODS), with motifs from the HOCOMOCO ping with H3K4me3 peaks or TSS regions (2.5 kb upstream to 500 bp
database54 (http://hocomoco11.autosome.ru/). downstream of TSS) but overlapping with representative open chroma-
tin 500-bp flanking regions. As testis contains broad-spread H3K4me3
H3K27ac and H3K4me3 ChIP-seq data processing signals across the whole genome, instead of using H3K4me3 peaks,
We mapped the ChIP-seq reads to the zebrafish genome GRCz10 using we used the annotated TSS regions that overlapped with H3K27ac
BWA aligner55. The mapped reads with MAPQ less than 30 were removed, peaks and representative open chromatin regions to define active
and PCR duplicated reads were removed by Picard. We calculated the promoters. Those that overlapped with representative open chroma-
Pearson correlation coefficient between two biological replicates tin regions without H3K27ac peaks defined weak promoters, whereas
and merged paired files with a high correlation. For the H3K27ac active enhancers were defined by H3K27ac peaks that were away from
and H3K4me3 histone marks, we used parameters recommended by the annotated TSS regions but overlapped with representative open
ENCODE-DCC (https://github.com/ENCODE-DCC/chip-seq-pipeline2). chromatin regions.
In brief, candidate narrow peaks were first selected with −logP >5, and To generate the non-redundant union sets of each type of cis-
−logQ >2. Reads per million (RPM) of immunoprecipitate (IP) data regulatory elements, we merged the active enhancers, active promoters
(RPMIP) and input data (RPMinput) in each peak region were calculated, and weak promoters from different tissues, respectively, if the peaks
and the qualified peaks should pass the threshold of twofold enrich- were within 500 bp, and used the middle points to represent the loca-
ment (RPMIP ≥ 2 × RPMinput) and RPMIP − RPMInput > 1. tion of merged active enhancers, active promoters and weak promoters,
then fixed the length of each element into 2 kb.
Reads counts of peak in IP We defined heterochromatin in each tissue by merging the H3K9me3
RPM(IP) =
Number of non−duplicated reads per million in IP and H3K9me2 peaks within 500 bp. We converted the cis-regulatory
elements from zebrafish genomic locations(danRer10) to human and
RPM(Input) mouse genomic locations (hg38 and mm10, respectively), using the
Reads counts of peak in input liftOver tools with the centre 100 bp of each element and required
= minMatch >0.1, the danRer10ToHg38/Mm10.over.chain files were
Number of non−duplicated reads per million in input
modified based on the zebrafish conversed CNE database58 (http://
zebrafish.stanford.edu).
H3K9me3 and H3K9me2 ChIP-seq data processing
Since the H3K9me3 and H3K9me2 markers are broad domain, we called Identification of tissue-specific cis-regulatory elements
the peaks using Homer with the parameter ‘-region -size 1,000’, and We identified tissue-specific cis-regulatory elements based on the
peaks within 5 kb were merged together. union sets of each type of cis-regulatory element. Then we computed
the H3K27ac RPM change to represent the active enhancer/active pro-
Reproducibility of ChIP-seq data moter intensity, and the resulting matrix was quantile normalized.
To check the reproducibility of biological replicates, we divided the The normalized matrix was transformed into Z-scores to identify the
reference genome into 10-kb bins and computed the number of reads tissue-specific elements at a threshold of Z > 2 and a minimum of two-
within each bin. The Pearson correlation coefficients between biologi- fold change in magnitude. GO term analysis of top 1,000 tissue-specific
cal replicates for H3K4me3, H3K27ac, H3K9me3 and H3K9me2 were enhancers or active promoters (ranked by intensity) was performed
calculated using the normalized 10-kb binned reads. After confirming by using GREAT 3.0.058 after liftover the genomic coordinate of each
that all replicates were highly correlated, we pooled the .bam files of type of cis-regulatory elements into Zv9/danRer7. Motif analysis was
biological replicates together with the merge function of Samtools56 performed by HOMER2. For super-enhancers, only the narrow peaks
for further analysis. within super-enhancers were used for the GO term and motif analysis.
Identification of the zebrafish genome blacklist WGBS data processing

Genomic experiments based on Illumina sequencing (for example, WGBS paired-end reads were mapped to the zebrafish genome assem-
ChIP-seq and ATAC-seq) often produce artificial high signals in certain bly GRCz10, as previously described with the minor modifications59.
genomic regions, such as centromeres, telomeres and satellite repeats. To increase the mapping efficiency, the first ten low-quality base
It is therefore essential to identify and remove these artificial signals pairs of the sequence read 1 s, and the first 15 of the sequence read
that exist ChIP-seq and ATAC-seq experiments. To flag these artifi- 2 s were trimmed along with adaptor sequences using Trim Galore!
cial regions, ChIP-seq input sequencing data were used as IP with the (The Babraham Institute) version 0.6.1 with the following param-
whole genome sequence as background to call the artificial peaks by eters: --clip_R1 10 --clip_R2 15 --paired --retain unpaired -r1 21 -r2 21.
MACS257. The peaks with −log10(q-value) less than 5 were filtered out. The trimmed reads were mapped to in silico bisulfite-converted
Then, the ChIP sequencing data were used to call peaks with the genome zebrafish genome reference by using Bismark60 v.0.18.1 with the fol-
sequence as background, with the same threshold. The overlapped lowing parameters: -X 2000 --un -N 1 -L 28. Unpaired or unmapped
peaks of these two datasets crossing all tissues were defined as the read 1s were then mapped as single read mode by using Bismark with
zebrafish genome blacklist, and these regions were filtered out for all the following parameters: -N 1 -L 28. Unpaired or unmapped read 2s
ChIP-seq and ATAC-seq peak-calling analyses (Supplementary Table 19). were also mapped as single read mode by using Bismark with the fol-
lowing parameters: --pbat -N 1 -L 28. The redundant reads from PCR
Identification of cis-regulatory elements amplification were then removed by using the following Bismark com-
To systematically compare ChIP-seq data, we used a 1-kb flanking region mand: deduplicate_bismark --bam. The methylation information for
of summit peaks to define the promoters and enhancers across all tis- individual cytosines was extracted from the deduplicated reads by
sues. We observed, on average, 96% of H3K27ac and H3K4me3 were using Bismark with the following commands: bismark_methylation_
overlapping with representative open chromatin regions (ATAC-seq extractor --comprehensive --merge_non_CpG --gzip. After merging
peaks ± 500 bp). Weak promoters were defined by H3K4me3 peaks that paired-end and single-end extracted files, the Bismark commands
bismark2bedGraph --CX and coverage2cytosine --CX was used to cal- settings for nanopore reads. To improve the accuracy of our assembly,
culate total read count and methylation read count per each C. The we performed four rounds of genome polishing. In the first two rounds,
methylation levels and read coverage of each CpG were visualized on we mapped the nanopore long reads to the contigs using minimap2
the WashU Epigenome Browser using a methylC track61. (v.2.16-r922) and polished the contigs with Nanopolish (v.0.11.1). In
The mean CpG methylation levels, percentages of CpGs with low, rounds 3 and 4, the 10X Genomics reads (barcodes trimmed) were
medium, and high methylation levels, and distribution of CpG methyla- aligned to the contigs with BWA-MEM (0.7.15-r1140). Duplicated reads
tion levels were calculated by using CpGs with at least five read coverage. were marked by Picard (v.2.17.2). Pilon software67 (v.1.23) was used to
polish the contigs. 1,325 Nanopore contigs (maximum length = 40.077
UMRs and LMRs Mbp) were generated with N50 of 9.418 Mbp. Next, we applied BioNano
UMRs and LMRs were identified by using MethylSeekR v.1.22.062, fol- Solve v.3.2.1 software (‘non-haplotype’ with ‘no extend split’ and ‘no
lowing the tool’s recommendations with the minor modifications. The cut seg dups’) to combine the Nanopore contigs with BioNano opti-
random number generator seed of 123 was set at the beginning of the cal mapping data to generate 114 chromosome arm-level scaffolds
analysis to ensure reproducibility. Partially methylated domains were (N50 = 29.725 Mbp and maximum length = 42.585 Mbp). At the last
identified and masked using the smallest chromosome 25 as a training step, we leveraged the brain Hi-C data with 3d-dna software68 (default
set. UMRs and LMRs were identified using cut-offs of less than 0.5 of parameters) to generate the final chromosome level scaffolds with a
methylation levels and at least 5 or 6 CpGs, ensuring FDRs below 5%. total length of 1,519 Mbp (N50 = 56.768 Mbp, maximum length = 75.632
UMRs and LMRs were assigned as active promoters, weak promoters, Mbp). The structure variation is called by BioNano_Access software for
or active enhancers if they overlap with these cis-regulatory elements the de novo assembled and GRCz10 reference genome.
defined in the same tissue by using BEDTools v.2.27.1. UMRs and LMRs
were similarly intersected with proximal or distal ATAC-seq peaks. Linking distal ATAC-seq peaks and active enhancers to genes by
correlation
DMRs We strictly followed previously proposed strategies30. In brief, we
DMRs between two tissues were identified by using DSS v.2.14.063. first removed ATAC-seq peaks whose normalized RPMs were less than
First, mean methylation levels of each CpG site was estimated with 1 across all tissues. We then computed the Pearson correlation coef-
smoothing64 in the DSS package. Then, dispersion at each CpG site ficient (PCC) between all ATAC-seq peaks (log2(RPM)) and genes
was estimated, and the Wald test on each CpG site was performed to (log2(TPM + 1)) whose TSSs were located within 500 kb of ATAC-seq
calculate statistical significance of methylation difference across dif- peaks. To determine PCC cut-off and estimate FDR, we randomly
ferent samples. Without replicates, DMRs were detected by using DSS selected 10,000 ATAC-seq peaks and computed their PCC with genes
package’s callDMR function with the following parameters: delta = 0.2, which were located on other chromosomes. To compute the P value for
p.threshold = 0.05, minlen = 200, minCG = 5, dis.merge = 50, pct. each correlation value, we calculated the z-score by compare it with the
sig = 0.5. Hypomethylated DMRs in a given tissue were further filtered mean and the standard deviation of the correlations of all the random
by intersecting with UMRs or LMRs in the same tissue. Tissue-specific ATAC-seq-to-gene pairs. The z-score was converted into a two-tailed
hypoDMRs were defined as DMRs hypomethylated in at least 8 out of P value. We then estimated FDR using the Benjamini–Hochberg proce-
10 pairwise comparisons. For the hypoDMRs with a heat map of DNA dure. The distal H3K27ac peaks were processed by the same way with
methylation levels (Fig. 3c), union set of hypoDMRs were obtained by FDR <0.01 and PCC cut-off = 0. 56142.
using BEDtools’ merge function. DNA methylation levels of union set
of hypoDMRs were calculated using CpGs with at least 5 read coverage. Motif comparative analysis between zebrafish and human
Only regions that exist uniquely in hypoDMRs of each tissue and have First, we predicted the motif enrichment in each zebrafish tissue-
DNA methylation levels available across all 11 tissues were used. Gene specific enhancer group using Homer findMotifGenome function.
ontology and wild-type expression enrichment analysis was performed Then, we merged the enriched motifs in all tissues to generate a
by the GREAT v.3.0.0, after converting genomic coordinate of hypoD- tissue-motif matrix. The P value of motifs in each tissue was quantile
MRs into Zv9/danRer7. Heat maps of DNA methylation levels, H3K27ac normalized. The human motif matrix was generated with the same
ChIP-seq and ATAC-seq signals of tissue-specific hypoDMRs along with method using human Roadmap ChIP-seq datasets69. Then we ranked
their neighbouring regions were plotted by using deepTools65. the motifs by normalized P value, and the top three motifs of each tis-
sue in zebrafish were used to compare with the human motif matrix
Repetitive elements enrichment analysis
We compared the predicted cis-regulatory elements with different Core transcriptional regulatory network analysis
subtypes of repetitive elements, including LTR, DNA, SINE, LINE, sat- To identify the regulatory transcription factor motifs for each tissue in
ellite, unknown and rolling circle (RC), annotated by RepeatMasker to zebrafish and human, we used HOMER70 to analyse the nucleosome-free
investigate whether some cis-regulatory elements were enriched or regions (based on H3K27ac intensity) within the 10-kb flanking regions
depleted of any specific types of repetitive elements. We analysed the of TSS sites for 661 transcription factor genes. We next used FIMO71 to
enrichment by calculating the number of overlapped base pairs between scan motif occupancy in the nucleosome-free regions with P < 1 × 10−5
each cis-regulatory element and subtype of repetitive elements. The as the threshold. Then, we performed the three-node motif network
equation of calculated fold enrichment is shown below: j represents the analysis using the previously described method31.
H3K27ac peaks in the active enhancer regions/ H3K9me3 peak regions/
H3K9me2 peak regions; i represents the subtype of repetitive elements. Hi-C data processing
Mapping and matrix generation. For Hi-C data, adaptor sequences
Length of j overlap with i
Length of j were trimmed, and low-quality reads were removed. Paired-end reads
Fold enrichment = Length of i were mapped to the GRCz10 genome using HiC-Pro v.2.9.072 (https://
Length of genome github.com/nservant/HiC-Pro). Singleton, multi-mapped, dumped,
dangling, self-circle paired-end reads, and PCR duplicates were all re-
moved by HiC-Pro after mapping. We generated raw contact matrices
Genome assembly at 25-kb, 40-kb, 100-kb, 500-kb and 1-Mb resolutions. Visualization of
The Oxford nanopore sequencing reads of zebrafish Tu (50× cov- Hi-C contact matrices was done using juicer tools v.1.8.973 and juicebox
erage) were assembled using Canu software66 (v.1.8) with default v1.1174 (https://github.com/aidenlab/juicer/wiki/Download). HiCRep
Article
was used to compute correlations between replicates32. The cool file Hundreds of such matrices were then aggregated, and the average
was generated by cooler v.0.8.675, cooltools v.0.4.0 (https://github. interaction intensity was calculated for each location of the matrix.
com/mirnylab/cooltools) was used to compute the expected value In the resulting plot, the centre point corresponds to the centre of
and observed value in different resolutions. each TAD, and boundaries of the centre block represent the average
locations of piled-up TAD boundaries.
A/B compartment. A/B compartment analysis was performed at 40-kb,
100-kb and 250-kb resolutions using HOMER software. The positive Single-cell analysis
eigenvalues were set to A compartments, and negative values were set We have generated 781,123,374 reads from 23,871 cells using 10X Genom-
to B compartments. We identified the regions with changes in sign of ics Chromium single cell ATAC-seq solution protocol. The BCL files
the PC1 value between muscle and brain as A/B compartment switched generated from sequencing were used as inputs to the 10X Genomics
regions. Cell Ranger ATAC-seq pipeline; then the FASTQ files were aligned to the
GRCz10 genome using BWA80 and the fragments with MAPQ >30 were
Directionality index calculation. The directionality index of the 40-kb kept for further analysis and each fragment is associated with a single
binned raw Hi-C matrix was calculated as previously described36. cell barcode. To qualify scATAC-seq, we calculated the Pearson correla-
tion coefficient between the aggregated signal of scATAC-seq and bulk
Insulation, boundary calculation and TAD calling. The TAD structure ATAC-seq. The results showed that single-cell data was highly correlated
(insulation/boundaries) was defined by the insulation score as previ- with bulk ATAC-seq (Fig. 2g). Then a cell-by-bin matrix was generated
ous studies76,77. The matrices which were used to calculate the insula- by segmenting the genome into 5-kb windows and scoring each cell
tion score were normalized by ICE method78 for discarding the bias of for reads in each window. The high-quality cells were kept with the
raw matrices. The insulation score of the ICE matrix was calculated log10(UMI) value between 3 and 5 and the fraction of reads in promoters
by the following parameters: -is 480,000 –ids 320,000 -im iqrMean between 10% and 65% (Extended Data Fig. 4a). Finally, 19,955 cells passed
-ss 160,000. quality control for further analysis. We used the SnapATAC (https://
github.com/r3fang/SnapATAC) method to reduce the dimensionality
Hi-C loop calling. Loops were computed from Hi-C matrices using of the dataset and the deep neural network-based scAlign81 to remove
HiCCUPS43 with the parameter ‘(–ignore_sparsity -r 10000- k KR -f .1,.1,.1 the batch effect of two replicates. We then identified 25 clusters using
-p 4,2,1 -i 7,5,3 -t 0.02,1.5,1.75,2 -d 20000,20000,50000’, as previously the graph-based clustering. To identify cell-type-specific regulatory
described and part of the juicer tools package. Consistent loops were elements, we called peaks on aggregated single cells from each cluster
identified using pairtopair function in BEDtools with the parameter using MACS2. Finally, we identified a total of 268,268 non-redundant
‘–type both –f 0.5’. Tissue-specific loops were identified using pairtopair peaks, of which 195,004 peaks were also identified as peaks in bulk
function in BEDtools with the parameter ‘-type notboth -slop 40000’. ATAC-seq data. A total of 171,159 cluster-specific differential peaks (DA
peaks) were identified by Fisher’s exact test, and the threshold was set
Identification of evolutionary breakpoints distribution in TADs as FDR <0.05. To determine the cell type of each cluster, we performed
We used progressiveMauve79 to identify the genomic rearrangement motif analysis based on known transcription factor binding motifs in
breakpoints between zebrafish and human, gibbon, chimp, gorilla, DA peak regions of each cluster using HOMER. We used ChromVAR82
mouse, cat, dog, pig, sheep, cattle, chicken, zebra finch, Xenopus, platy- to estimate bias-corrected deviations of transcription factor binding
fish, stickleback, tilapia, medaka and fugu, respectively. To calculate motif enrichment and the result was consistent with the top motif
the breakpoint density in TADs, we divided the zebrafish genome into enrichments for DA peaks of each cluster.
100-kb bins and computed the number of overlapped breakpoints.
We plotted the profile figure using the computeMatrix and plotPro- Statistics
file packages from deepTools, with a TAD length of 2,800 kb (median All box plots in main and extended data figures were plotted using R
size of the TADs) and boundary length of 40 kb. We defined TADs with and Python. In box plots, the horizontal line shows the median, the
breakpoints using bedtools intersect function with parameter ‘-f 0.3’. box encompasses the interquartile range, and whiskers extend to 5th
and 95th percentiles.
H3K4me3 pattern analysis of pile-up TADs
To integrate histone signals and cis-regulatory element intensity in Reporting summary
TADs, we divided the zebrafish genome into 100-kb bins. For H3K4me3, Further information on research design is available in the Nature
we computed the H3K4me3 per base-pair signal in muscle within each Research Reporting Summary linked to this paper.
bin. We used computeMatrix and plotProfile packages from deepTools
to plot the density of H3K4me3. To compare the expression similarity
of the genes in TADs with breakpoints and TADs without breakpoints, Data availability
we calculated the Spearman correlation coefficients of gene expression All the sequencing data are deposited in the NCBI Gene Expression
TPM across eleven tissues between zebrafish and human. Omnibus under accession code GSE134055. All the genomic data
generated in this study can be visualized in the WashU Epigenome
Interaction pattern analysis of pile-up TADs Browser (https://epigenome.wustl.edu/zebrafishENCODE/). The
To visualize the overall interaction patterns within TADs, we performed human histone-modification ChIP-seq data were downloaded from
a pile-up analysis similar to APA plotting43. We first normalized the the ROADMAP Project. The mouse histone modification ChIP-seq data
interaction matrix by dividing each entry of the raw 25-kb matrix by were downloaded from the mouse ENCODE Consortium. The human tis-
the expected interaction frequency at the corresponding genomic sue transcriptome data were downloaded from the GTEx Consortium.
distance. To make the TAD boundary sharper, we only considered The public zebrafish ChIP-seq and ATAC-seq data used in this study are
the whole-genome intra-TAD interaction frequencies and excluded listed in Supplementary Table 6. The human h1-ESC Hi-C data were
inter-TAD interactions when we calculated the expected values. Then downloaded from GSE52457. GM12878 and K562 GRO-seq data
the TAD sets with and without breakpoints were separately piled up were downloaded from GSE60456. GM12878 and K562 CTCF ChIP-seq
and averaged over the normalized matrix. Specifically, for each TAD were downloaded from GSE31477. GM12878 and K562 Pol2 ChIP-seq were
of interest, we extended fixed 40 bins (1 Mb) on both sides of the mid- downloaded from GSE91426 and GSE31477. Source data are provided
point and extracted the 80 × 80 matrix within the extended region. with this paper.
42. Pedroso, G. L. et al. Blood collection for biochemical analysis in adult zebrafish. J. Vis. 68. Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields
Exp. 3865, e3865 (2012). chromosome-length scaffolds. Science 356, 92–95 (2017).
43. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles 69. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518,
of chromatin looping. Cell 159, 1665–1680 (2014). 317–330 (2015).
44. Xie, W. et al. Epigenomic analysis of multilineage differentiation of human embryonic 70. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime
stem cells. Cell 153, 1134–1148 (2013). cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 38,
45. Kim, D. et al. TopHat2: accurate alignment of transcriptomes in the presence of insertions, 576–589 (2010).
deletions and gene fusions. Genome Biol. 14, R36 (2013). 71. Grant, C. E., Bailey, T. L. & Noble, W. S. FIMO: scanning for occurrences of a given motif.
46. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals Bioinformatics 27, 1017–1018 (2011).
unannotated transcripts and isoform switching during cell differentiation. Nat. 72. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing.
Biotechnol. 28, 511–515 (2010). Genome Biol. 16, 259 (2015).
47. Wang, L. et al. CPAT: Coding-Potential Assessment Tool using an alignment-free logistic 73. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C
regression model. Nucleic Acids Res. 41, e74 (2013). experiments. Cell Syst. 3, 95–98 (2016).
48. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013). 74. Robinson, J. T. et al. Juicebox. js provides a cloud-based visualization system for Hi-C
49. Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or data. Cell Syst. 6, 256–258 (2018).
without a reference genome. BMC Bioinformatics 12, 323 (2011). 75. Abdennur, N. & Mirny, L. A. Cooler: scalable storage for Hi-C data and other genomically
50. Maertin, M. Cutadapt removes adapter sequences from high-throughput sequencing labeled arrays. Bioinformatics 36, 311–316 (2020).
reads. EMBnet J. 17, 10–12 (2011). 76. Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage
51. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods compensation. Nature 523, 240–244 (2015).
9, 357–359 (2012). 77. Giorgetti, L. et al. Structural organization of the inactive X chromosome in the mouse.
52. Li, Z. et al. Identification of transcription factor binding sites using ATAC-seq. Genome Nature 535, 575–579 (2016).
Biol. 20, 45 (2019). 78. Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome
53. Korhonen, J., Martinmäki, P., Pizzi, C., Rastas, P. & Ukkonen, E. MOODS: fast search for organization. Nat. Methods 9, 999–1003 (2012).
position weight matrix matches in DNA sequences. Bioinformatics 25, 3181–3182 79. Darling, A. E., Mau, B. & Perna, N. T. progressiveMauve: multiple genome alignment with
(2009). gene gain, loss and rearrangement. PLoS ONE 5, e11147 (2010).
54. Kulakovskiy, I. V. et al. HOCOMOCO: towards a complete collection of transcription 80. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler
factor binding models for human and mouse via large-scale ChIP-seq analysis. Nucleic transform. Bioinformatics 25, 1754–1760 (2009).
Acids Res. 46 (D1), D252–D259 (2018). 81. Johansen, N. & Quon, G. scAlign: a tool for alignment, integration, and rare cell
55. Gerstein, M. B. et al. Integrative analysis of the Caenorhabditis elegans genome by the identification from scRNA-seq data. Genome Biol. 20, 166 (2019).
modENCODE project. Science 330, 1775–1787 (2010). 82. Schep, A. N., Wu, B., Buenrostro, J. D. & Greenleaf, W. J. chromVAR: inferring
56. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, transcription-factor-associated accessibility from single-cell epigenomic data.
2078–2079 (2009). Nat. Methods 14, 975–978 (2017).
57. Liu, T. Use model-based Analysis of ChIP-Seq (MACS) to analyze short reads generated by
sequencing protein-DNA interactions in embryonic stem cells. Methods Mol. Biol. 1150,
81–95 (2014). Acknowledgements This work was supported by NIH grants R35GM124820, R01HG009906,
58. Hiller, M. et al. Computational methods to detect conserved non-genic elements in R24DK106766 (R.C.H. and F.Y.) and R01DK107735 (G.S.G.). F.Y. is also supported by
phylogenetically isolated genomes: application to zebrafish. Nucleic Acids Res. 41, e151 U01CA200060. T.W. is supported by NIH grants R01HG007175, R01HG007354, R01ES024992,
(2013). U24ES026699 and U01HG009391. We thank J. A. Stamatoyannopoulos for discussion and
59. Lee, H. J. et al. Regenerating zebrafish fin epigenome is characterized by stable suggestions; H. Lyu for proof reading and other Yue lab members for discussion; and E.
lineage-specific DNA methylation and dynamic chromatin accessibility. Genome Biol. 21, DeForest, S. Stella, P. Hubley and Penn State Zebrafish Functional Genomics Core for fish
52 (2020). husbandry and embryo collection.
60. Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for
Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011). Author contributions F.Y. conceived and supervised the project. H.Y. and T.L. collected tissue
61. Zhou, X., Li, D., Lowdon, R. F., Costello, J. F. & Wang, T. methylC Track: visual integration of and conducted experiments. Y.L. led the data analysis. Y.L., H.Y., H.J.L., Y.W., X.W., B.Z., L.F. and
single-base resolution DNA methylation data on the WashU EpiGenome Browser. J.W. conducted analyses. D.L. and T.W. provided the website for data presentation. K.C.A. and
Bioinformatics 30, 2206–2207 (2014). K.C.C. provided animal support. Q.J., X.X., J.X., F.S., I.S., C. K., T.S., M.N.K.C., J.T., K.W., G.S.G.,
62. Burger, L., Gaidatzis, D., Schübeler, D. & Stadler, M. B. Identification of active regulatory R.C.H., T.W. and K.C.C. helped with data interpretation. H.Y., Y.L., T.L. and F.Y. prepared the
regions from DNA methylation data. Nucleic Acids Res. 41, e155 (2013). manuscript with input from all authors.
63. Wu, H. et al. Detection of differentially methylated regions from whole-genome bisulfite
sequencing data without replicates. Nucleic Acids Res. 43, e141 (2015). Competing interests The authors declare no competing interests.
64. Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite
sequencing reads to differentially methylated regions. Genome Biol. 13, R83 (2012). Additional information
65. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
analysis. Nucleic Acids Res. 44 (W1), W160–W165 (2016). 2962-9.
66. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer Correspondence and requests for materials should be addressed to F.Y.
weighting and repeat separation. Genome Res. 27, 722–736 (2017). Peer review information Nature thanks Michael Beer, Jesse Dixon and the other, anonymous,
67. Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection reviewer(s) for their contribution to the peer review of this work.
and genome assembly improvement. PLoS ONE 9, e112963 (2014). Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Tissue-specific gene expression in zebrafish. Testis = 1,605). d, Distribution of H3K4me3 signals surrounding the known and
a, Clustering analysis of transcripts from RNA-seq data in embryonic and adult predicted novel transcripts. e, Human orthologues of zebrafish tissue-specific
tissues (n = 31,842). b, c, Gene Ontology and KEGG pathway analysis for the genes were more tissue-specific compared to human orthologues of
tissue-specific genes in adult brain, heart and testis (the number of non-tissue-specific zebrafish genes (n = 14,764, 3,739, 6,043, Mann–Whitney U
tissue-specific genes in these two figures are, Brain = 3,693, Heart = 392, Test, two-sided, ***P < 2.2 × 10 −16).
Extended Data Fig. 2 | Comparative analysis of zebrafish cis-regulatory promoters have higher expression level. Blue hollow bar indicates the known
elements. a, Comparison of the predicted regulatory elements identified with mrpl39 promoter. Orange hollow bar indicates the potential novel promoter.
previous data. Enhancers were based on H3K27ac signals in the same four The mrpl39 promoter has H3K4me3 peaks in both muscle and brain, but only
tissues (brain, heart, intestine, testis) from Perez-Rico et al. 2017. The data we has strong H3K27ac signals in muscle and its expression is higher (4.43-fold).
generated are from Tübingen zebrafish strain and the published results were d, Gene Ontology results for the muscle-specific enhancers and skin-specific
from the AB strain. b, Number of predicted cis-regulatory elements in each enhancers. We used the GREAT tool for this analysis (the numbers of
tissue. E-brain stands for 1 dpf embryonic neuron cells. E-trunk stands for 1 dpf tissue-specific enhancers used in this figure are muscle = 813, skin = 512).
zebrafish whole trunk region. c, An example showing genes with active
Article
Extended Data Fig. 3 | Enhancer reporter assay for tissue-specific embryos, respectively, had green signals in the heart region. For the six tested
enhancers. In total, 28 of 32 predicted tissue-specific enhancers showed muscle enhancers, 52/57, 26/30, 107/124, 53/63, 93/114, 61/67 and 66/78
consistent GFP signals in the corresponding tissues. For the eight brain embryos, respectively, had green signals in the trunk muscle. For the four
enhancers tested, 63/95, 51/86, 85/119, 112/143, 27/45, 34/48, 27/41, 62/77, and selected kidney enhancers, 47/82, 35/67, 44/62, 15/42 and 56/110 embryos,
37/45 embryos, respectively, had green signals in the brain region. For the six respectively, had green signals in the kidney region.
tested heart enhancers, 64/94, 52/85, 79/121, 20/41, 51/95, 32/55 and 20/31
Extended Data Fig. 4 | Single-cell ATAC-seq in zebrafish brain. a, Barcode distribution in the tSNE projection. Bottom left, pileups of differentially
selection of single cell ATAC-seq. The x-axis represents the log value of the accessible ATAC-seq signals for each cluster. Shown in the figure is the +/− 10kb
number of unique molecular identifiers (UMI); the y axis represents the ratio of flanking region surrounding peak centres. Bottom right, most significantly
fragments in promoter regions; the red lines represent threshold, and the grey enriched transcription factor motif for each cluster. e, t-SNE projection of all
shadows represent that the barcode passed the filter. b, Genomic distribution scATAC-seq cells colored by Z-score of peak enrichment. f, Motif enrichment of
of all differentially accessible (DA) peaks. c, Overlap of all differentially known neuron-specific TFs in scATAC-seq predicted clusters (n = 19,955).
accessible peaks with enhancers predicted in bulk brain. d, Top, the cluster
Article
Extended Data Fig. 5 | Heterochromatin annotation in adult tissues. depleted of ATAC-seq, H3K4me3 and H3K27ac ChIP-seq signals (n = 68,789
a, WashU Epigenome Browser screenshot of H3K9me3 and H3K9me2 histone H3K9me3 sites and n = 73,777 H3K9me2 sites). f, Overlap of H3K9me3 sites,
ChIP-seq signals in 11 zebrafish adult tissues. The values on the y-axis were H3K9me2 sites, and ATAC-seq peaks with repetitive elements (The total
input-normalized. b, Distribution of H3K9me3 and H3K9me2 sites in the number of each bar, from left to right, 68,789, 73,777 and 436,036). g, Examples
zebrafish genome. c, Venn diagram shows the overlap between H3K9me3 and of H3K9me3 sites in one tissue found to be active regions in other tissues.
H3K9me2 sites in zebrafish genome. d, Overlapping percentile of H3K9me3 Horizontal scale 0-20 for H3K27ac and H3K4me3, 0-10 for RNA-seq, 0-5 for
and H3K9me2 peaks in adult tissues. e, H3K9me3 and H3K9me2 sites were H3K9me3 and H3K9me2.
Extended Data Fig. 6 | DNA methylation level and distribution in adult genomic features or repetitive element classes. CDS, coding sequence.
tissues. a, Fraction of total CpGs with low (<25%), medium (≥25% and <75%), and f, Number of UMRs and LMRs in zebrafish tissues and their overlap with
high (≥75%) methylation levels and mean CpG methylation levels (mCG/CG) in enhancer and promoters (left panel) (number of UMR and LMR, from top to
zebrafish adult tissues (the mCG/CG ratio, from left to right, 0.788, 0.859, bottom, 14,990, 10,569, 14,569, 14,587, 14,831, 14,289, 13,842, 13,569, 14,424,
0.790, 0.777, 0.791, 0.797, 0.781, 0.777, 0.804, 0.789, 0.781). b, Distribution of 14,374, 13,908, 30,009, 7,916, 19,038, 21,411, 22,591, 16,796, 14,961, 16,268,
CpG methylation levels across zebrafish adult tissues. c, The distribution of 17,481, 15,932, 15,665) and ATAC-seq peaks (right panel)(numbers of UMR and
non CpG methylation in 11 adult tissues. d, Mean methylation levels of the LMR are the same with left panel). g, Clustering of tissue-specific hypoDMRs.
tissue-specific gene promoters. n represents the number of tissue-specific Values in the heat map are mean methylation levels of hypoDMRs (n = 17,654,
gene promoter. e, Mean methylation level of CpGs overlapping different number of tissue-specific hypoDMRs).
Article
Extended Data Fig. 7 | De novo assembly of zebrafish chromosome 4 of the c, Overall strategy of de novo assembly of the Tübingen chr4 by integrating
Tübingen strain. a, WashU Epigenome Browser snapshot showing that 10X, Nanopore, Bionano, and Hi-C data. d, Bionano long molecule sequencing
heterochromatic marks H3K9me2 and H3K9me3 signals were enriched on data shows that there were many SVs on chr4 when mapped to the GRCz11
chromosome 4 in zebrafish testis. The values on the y-axis were input-normalized. reference genome. e, SVs on chr4 detected by Bionano when the data were
b, H3K9me2, H3K9me3, and DNA methylation level on chr4 long arm are mapped to the de novo assembled chr4.
significantly higher than other regions in all tissues (n = 11, two-sided, t-test).
Extended Data Fig. 8 | Conservation of cis-regulatory elements from computed their conservation percentage. The simulations were performed
zebrafish to other vertebrates. a, Percentage of zebrafish enhancers whose 20 times and the average percentage was presented. d, Another example of
sequences were conserved in human (the number of each bar, from left to right, ultra-conserved noncoding element (UCNE). This element (FOXP1_Finn_1) is
13,307, 7,018, 11,940, 7,499, 14,783, 14,272, 8,995, 13,777, 10,757, 15,505, 1,734, predicted to be a muscle enhancer in zebrafish, mouse, and human. Grey
4,011, 5,247). b, c, Similar to Fig. 4a. Percentage of zebrafish exons and vertical bar marks the ultra-conserved region. Red vertical bar is the enhancer
cis-regulatory elements that have orthologous sequences in mouse and other sequence in the human genome that was validated as a limb enhancer by
fish species. Total number of each bar, from left to right: 1,000, 25,593, 58,065, transgenic mouse reporter assay in the VISTA Enhancer Browser (#hs956).
1,000. For exons and random, we randomly sample 1000 elements and
Article
Extended Data Fig. 9 | Distal ATAC-seq peak-to-gene pairs, enhancer- different downstream targets by motif prediction analysis. f, The overall
to-gene pairs, and transcriptional regulation network. a, b, Distance structure of the regulatory network is conserved between human and
distribution of cis-regulatory elements to their linked gene TSS. c, Correlation zebrafish. FFL connection analysis was performed, in this analysis, there are
of ATAC-seq peak-to-gene pairs and Enhancer-to-gene pairs (n from left to three types of nodes: A, driver node that regulates B and C; B, middle node,
right = 3,292, 3,827, 3,544, 3,281, 3,008, 2,795, 2,357, 2,001, 1,106). d, Validation regulated by A but regulating node C; C, passenger node, regulated by both A
of predicted enhancer-to-gene pairs by Hi-C interaction counts in muscle. and B.
e, mef2d is a regulator in both zebrafish muscle and heart, but it regulates
Article
Extended Data Fig. 10 | Compartment and TADs in zebrafish. a, Heat map of RNA-seq signals. d, Examples of shared TADs between zebrafish brain and
genome-wide Hi-C interaction matrices in zebrafish brain (blue) and muscle muscle. e, Average DI scores surrounding TAD boundaries identified in brain
(red). b, Active marks (H3K4me3, H3K27ac, and ATAC-seq) were enriched in (upper panel) and muscle (lower panel). f, ChIP-seq data shows that CTCF
compartment A and depleted in compartment B. Repressive marks (H3K9me2 binding sites were enriched at TAD boundaries. g, Footprint analysis of ATAC-
and H3K9me3) were enriched in compartment B. Error bands represent seq peaks in the TAD boundaries shows enrichment of CTCF binding motif
standard error of the mean. c, Genome browser snapshot of A/B compartment (number of each bar, from left to right, 0.213, 0.24, 0.22, 0.237, 0.251, 0.232,
in brain and muscle. The blue vertical shaded area marks a region that is located 0.24, 0.262, 0.271, 0.281, 0.37, 0.27, 0.253, 0.25, 0.252, 0.253, 0.26, 0.23, 0.238,
in compartment B in brain but in compartment A in muscle. As expected, A 0.24, 0.22). h, Repetitive elements enriched at TAD boundaries (left panel) and
compartment which is associated with more ATAC-seq peaks, H3K27ac and loop anchors (right panel).
Extended Data Fig. 11 | Comparing zebrafish evolutionary breakpoints Orange vertical bar labels the TAD boundaries. c, Higher H3K4me3 levels at
with TAD annotation. a. Similar to Fig. 5d. Enrichment of evolutionary breakpoint-containing TAD boundaries when using TADs annotation from
breakpoints at TAD boundaries. Relative positions of evolutionary breakpoints zebrafish muscle were found as well, similar to Fig. 5g. d, H3K4me3 enrichment
to TADs in 15 vertebrates. In all cases, we found that the evolutionary in human ESCs (H1) TAD boundaries with or without zebrafish-to-human
breakpoints were enriched at zebrafish TAD boundaries and depleted from the breakpoints. e, H3K4me3 enrichment in mouse ESCs TAD boundaries with or
centre of TADs. Grey vertical bar labels the TAD body area. b, By comparing without zebrafish-to-mouse breakpoints. f, H3K4me3 enrichment in human
zebrafish with 17 vertebrates, H3K4me3 signals were found to be more ESCs (H1) TAD boundaries with or without mouse-to-human breakpoints.
enriched at TAD boundaries with breakpoints than those without breakpoints.
Article
Extended Data Fig. 12 | TADs with and without breakpoints. a, H3K27ac and stronger interaction frequencies in the middle than TADs with evolutionary
ATAC-seq signals do not show differences at TAD boundaries with breakpoints breakpoints (upper panel). Breakpoints in these 17 vertebrates were defined by
compared to those without breakpoints. Orange vertical bar labels the TAD comparing their genomes to the zebrafish genome. e, Distribution of
boundaries. b, Sizes of TADs with and without evolutionary breakpoints were correlations between the expression pattern of each pair of paralogs across 11
similar (n = 573, 777, two-sided, t-test). c, Enrichment of transcription at adult zebrafish tissues. f, Correlations between pairs of paralogs located on the
breakpoints (BP) that overlap with CTCF TAD boundaries in K562 cells same chromosome. Among them, 17 pairs were located within the same TAD,
(the number of breakpoints in blue line is 639, red line is 625). d, In 17 and the rest of the 65 pairs were located in different TADs. As a control, we
vertebrates, TADs without evolutionary breakpoints (bottom panel) have randomly sampled 100 genes. Number of each bar, from left to right, 17, 65, 100.
Corresponding author(s): Yue
Last updated by author(s): 07/05/2020
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection Images of reporter assays in zebrafish embryos were collected using Zeiss SteREO Discovery.V8 microscope by AxioVison Rel.4.8 software. All
the list of public zebrafish ChIP-seq and ATAC-seq used in this study were listed in the supplementary table 6.
Data analysis The ENCODE histone ChIP-seq pipeline(https://github.com/ENCODE-DCC/chip-seq-pipeline2)

The ENCODE ATAC-seq pipeline(https://github.com/ENCODE-DCC/atac-seq-pipeline)
RepeatMasker 4.1.0
bowtie 1.2.2
Trim Galore 0.6.0 (https://github.com/FelixKrueger/TrimGalore).
macs2 2.2.4
bedtools v2.29.0
Bowtie2 2.3.4
Samtools 1.7
STAR 2.6.0
RSEM 1.3.0
April 2020
Deeptools 3.3.1
GREAT 3.0.0
Tophat2 2.1.0
Cufflinks 2.2.1
CPAT v1.2.3
TransDecoder V5.1.0 (https://transdecoder.github.io/)
blast/2.7.1
biomaRt 2.36.0
Cutadapt 2.5
1
picard 1.126 (http://broadinstitute.github.io/picard/).
ChIPseeker 1.19.1

Bismark 0.18.1
MethylseekR 1.22.0
DSS v2.14.0
Homer 4.10
FIMO 5.0.5
Hi-C pro 2.11.1 (https://github.com/nservant/HiC-Pro)
cooler 0.8.6 (https://github.com/mirnylab/cooler)
cooltools v.0.4.0 (https://github.com/mirnylab/cooltools)
HiCrep 0.4.0
progressiveMauve 2.4.0
BWA 0.7.17
cell ranger 3.1.0
snapATAC 1.0.0 (https://github.com/r3fang/SnapATAC)
scAlign 1.0
ChromVAR 1.6.0
HINT v0.13.0
Juicer 1.8.9 (https://github.com/aidenlab/juicer/wiki/Download)
Juicebox 1.11.08
Bionano_Access1.5.1
Bionano_Solve Solve3.5.1_01142020
3d-dna 180419
lastz version 1.04.00
Canu Version 2.0 (https://github.com/marbl/canu)
Pilon Version 1.23
limma 3.45.9
MOODS 1.9.4 49(https://github.com/jhkorhonen/MOODS)
minimap2 2.16-r922
Nanopolish 0.11.1
BWA-MEM 0.7.15-r1140
Picard 2.17.2
Data
Next generation sequencing data have been deposited in Gene Expression Omnibus (GEO) under the following accession numbers ：GSE134055
The genome browser link: https://epigenome.wustl.edu/zebrafishENCODE/
The human h1-ESC Hi-C data were downloaded from GSE52457.
GM12878 and K562 GRO-seq data were downloaded from GSE60456.
GM12878 and K562 CTCF ChIP-seq were downloaded from GSE31477.
GM12878 and K562 Pol2 ChIP-seq were downloaded from GSE91426 and GSE31477.

April 2020
Sample size The number of adult zebrafish used for each tissue were determined in order to obtain enough tissue for 4 ChIP-seq, RNA-seq, ATAC-seq,
WGBS and Hi-C experiments. In all, 160 datasets including 11 WGBS, 26 RNA-seq, 95 ChIP-seq, 22 ATAC-seq, 4 Hi-C and 2 single-cell ATAC-seq
were used in this study.
Data exclusions This is not relevant since we used all sequencing data in this study
Replication We have two replicates for each ChIP-seq (only one replicate for Kidney H3K27ac) , RNA-seq, ATAC-seq, Hi-C and single-cell ATAC-seq. We
performed two technical replicates for Whole Genome Bisulfite Sequencing (WGBS) and reached 30X coverage. We calculated the Pearson
2
correlation coefficient between two biological replicates using the reads counts of 10 kb-binned matrices. The correlation score of all
replicates were listed in supplemental Table 1.

Randomization This is not relevant since we did not use different experimental groups or conditions in our study.
Blinding Blinding was not relevant to our study since we did not have experimental groups to compare.


Antibodies ChIP-seq
Clinical data
Antibodies
Antibodies used Rabbit polyclonal histone H3K27ac antibody (Active Motif, 39133) 1:100
Rabbit polyclonal histone H3K4me3 antibody (EMD Millipore, 07-473) 1:100
Rabbit monoclonal histone H3K9me2 antibody (Cell Signaling, 4658) 1:100
Rabbit polyclonal histone H3K9me3 antibody (Abcam, ab8898) 1:100
Validation The four primary antibodies are commercial antibodies against Histone H3 modifications, validated as ChIP
grade by the manufacturer (active motif, milipore, cellsignal and Abcam):
https://www.activemotif.com/catalog/details/39133
https://www.emdmillipore.com/US/en/product/Anti-trimethyl-Histone-H3-Lys4-Antibody,MM_NF-07-473
https://www.cellsignal.com/products/primary-antibodies/di-methyl-histone-h3-lys9-d85b4-xp-rabbit-mab/4658
https://www.abcam.com/histone-h3-tri-methyl-k9-antibody-chip-grade-ab8898.html
Furthermore, these antibodies have been validated by various labs for ChIP-seq with zebrafish and also many other species of which
have the exact same amino acid sequence (PMIDs: 31794598, 24286030, 31495082, 30948728)

Laboratory animals All adult Tissues were collected from one-year old fish raised of Tuebingen strains ordered from ZIRC. The Gender information is
provided in supplementary table 17. Tg(huc:kaede) fish was used to cross with Tuebingen fish to obtain kaede labeled neuron cells.
Wild animals The study did not involve animals in the wild.
Field-collected samples The study did not involve animals collected from field.
Ethics oversight All procedures on live animals were approved by the Institutional Animal Care and Use Committee (IACUC) at the Pennsylvania State
University, ID: PRAMS201445659
ChIP-seq
Data deposition
April 2020
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.
Data access links GEO_number: GSE134055

May remain private before publication.
Files in database submission GSE134055_YueLab-ChIP-seq-Brain_H3K27ac_merged.narrowPeak.gz

GSE134055_YueLab-ChIP-seq-Brain_H3K4me3_merged.narrowPeak.gz
3
GSE134055_YueLab-ChIP-seq-Colon_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Colon_H3K4me3_merged.narrowPeak.gz

GSE134055_YueLab-ChIP-seq-Heart_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Heart_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Intestine_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Intestine_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Kidney_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Kidney_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Liver_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Liver_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Muscle_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Muscle_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Skin_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Skin_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Spleen_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Spleen_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Testis_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Testis_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Brain_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Brain_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Trunk_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Trunk_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Blood_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Blood_H3K4me3_merged.narrowPeak.gz
GSE134055_YueLab-ATAC-seq-Brain.narrowPeak.gz
GSE134055_YueLab-ATAC-seq-Liver.narrowPeak.gz
GSE134055_YueLab-ATAC-seq-Muscle.narrowPeak.gz
GSE134055_YueLab-ATAC-seq-Skin.narrowPeak.gz
GSE134055_YueLab-ATAC-seq-Spleen.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Brain_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Brain_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Colon_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Colon_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Heart_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Heart_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Intestine_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Intestine_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Kidney_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Kidney_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Liver_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Liver_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Muscle_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Muscle_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Skin_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Skin_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Spleen_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Spleen_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-Testis_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-Testis_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Brain_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Brain_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Trunk_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Trunk_H3K9me2.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Blood_H3K9me3.peak.gz
GSE134055_YueLab-ChIP-seq-embryonic-Blood_H3K9me2.peak.gz
Genome browser session https://epigenome.wustl.edu/zebrafishENCODE/

(e.g. UCSC)
Methodology
Replicates Brain_RNA-seq 2 0.888
Blood_RNA-seq 2 946
Colon_RNA-seq 2 0.908
Heart_RNA-seq 2 0.849
Intestine_RNA-seq 2 0.853
E-brain_RNA-seq 2 0.911
April 2020
E-trunk_RNA-seq 2 0.875
Kidney_RNA-seq 2 0.91
Liver_RNA-seq 2 0.871
Muscle_RNA-seq 2 0.871
Skin_RNA-seq 2 0.916
Spleen_RNA-seq 2 0.852
Testis_RNA-seq 2 0.953
Brain_h3K27ac 2 0.97
Colon_h3K27ac 2 0.93
4
Blood_h3K27ac 2 0.91
Heart_h3K27ac 2 0.97

Intestine_h3K27ac 2 0.81
E-brain_h3K27ac 2 0.97
E-trunk_h3K27ac 2 0.98
Kidney_h3K27ac 1
Liver_h3K27ac 2 0.97
Muscle_h3K27ac 2 0.91
Skin_h3K27ac 2 0.96
Spleen_h3K27ac 2 0.91
Testis_h3K27ac 2 0.93
Brain_h3k4me3 2 0.97
Colon_h3k4me3 2 0.93
Blood_h3k4me3 2 0.91
Heart_h3k4me3 2 0.97
Intestine_h3k4me3 2 0.94
E-brain_h3k4me3 2 0.97
E-trunk_h3k4me3 2 0.98
Kidney_h3k4me3 2 0.94
Liver_h3k4me3 2 0.82
Muscle_h3k4me3 2 0.98
Skin_h3k4me3 2 0.97
Spleen_h3k4me3 2 0.91
Testis_h3k4me3 2 0.88
Brain_H3K9me3 2 0.942
Colon_H3K9me3 2 0.91
Blood_H3K9me3 2 0.903
Heart_H3K9me3 2 0.926
Intestine_H3K9me3 2 0.91
Kidney_H3K9me3 2 0.937
Liver_H3K9me3 2 0.936
Muscle_H3K9me3 2 0.926
Skin_H3K9me3 2 0.913
Spleen_H3K9me3 2 0.916
Testis_H3K9me3 2 0.974
Brain_H3K9me2 2 0.916
Colon_H3K9me2 2 0.901
Blood_H3K9me2 2 0.893
Heart_H3K9me2 2 0.881
Intestine_H3K9me2 2 0.925
Kidney_H3K9me2 2 0.897
Liver_H3K9me2 2 0.899
Muscle_H3K9me2 2 0.89
Skin_H3K9me2 2 0.889
Spleen_H3K9me2 2 0.933
Testis_H3K9me2 2 0.922
Brain_ATAC 2 0.913
Colon_ATAC 2 0.908
Blood_ATAC 2 0.901
Heart_ATAC 2 0.925
Intestine_ATAC 2 0.865
Kidney_ATAC 2 0.914
Liver_ATAC 2 0.968
Muscle_ATAC 2 0.905
Skin_ATAC 2 0.941
Spleen_ATAC 2 0.911
Testis_ATAC 2 0.925
Brain_WGBS 1
Colon_WGBS 1
Blood_WGBS 1
Heart_WGBS 1
Intestine_WGBS 1
Kidney_WGBS 1
Liver_WGBS 1
Muscle_WGBS 1
Skin_WGBS 1
Spleen_WGBS 1
April 2020
Testis_WGBS 1
Brain_HiC 2 0.981
Muscle_HiC 2 0.976
Brain_scATAC-seq 0.925
Sequencing depth Brain_RNA-seq_rep1 57,399,962 paired-ed 60

Blood_RNA-seq_rep1 33,709,554 paired-ed 60
Colon_RNA-seq_rep1 64,407,490 paired-ed 60
Heart_RNA-seq_rep1 57,992,414 paired-ed 60
5
Intestine_RNA-seq_rep1 62,593,691 paired-ed 60
E-brain_RNA-seq_rep1 81,655,736 paired-ed 60

E-trunk_RNA-seq_rep1 81,968,040 paired-ed 60
Kidney_RNA-seq_rep1 53,192,824 paired-ed 60
Liver_RNA-seq_rep1 66,886,775 paired-ed 60
Muscle_RNA-seq_rep1 61,018,951 paired-ed 60
Skin_RNA-seq_rep1 43,248,349 paired-ed 60
Spleen_RNA-seq_rep1 19,289,511 paired-ed 98
Testis_RNA-seq_rep1 80,040,638 paired-ed 60
Brain_h3K27ac_rep1 22,919,396 paired-ed 150
Colon_h3K27ac_rep1 2,913,190 paired-ed 150
Blood_h3K27ac_rep1 12,855,702 paired-ed 150
Heart_h3K27ac_rep1 25,213,922 paired-ed 150
Intestine_h3K27ac_rep1 37,480,680 paired-ed 150
E-brain_h3K27ac_rep1 81,331,004 paired-ed 150
E-trunk_h3K27ac_rep1 61,361,014 paired-ed 150
Kidney_h3K27ac_rep1 31,757,872 paired-ed 150
Liver_h3K27ac_rep1 18,410,148 paired-ed 150
Muscle_h3K27ac_rep1 2,951,696 paired-ed 150
Skin_h3K27ac_rep1 54,680,744 paired-ed 150
Spleen_h3K27ac_rep1 7,119,136 paired-ed 150
Testis_h3K27ac_rep1 12,227,784 paired-ed 150
Brain_h3k4me3_rep1 20,554,470 paired-ed 150
Colon_h3k4me3_rep1 4,982,782 paired-ed 150
Blood_h3k4me3_rep1 14,199,752 paired-ed 150
Heart_h3k4me3_rep1 25,106,098 paired-ed 150
Intestine_h3k4me3_rep1 5,716,110 paired-ed 150
E-brain_h3k4me3_rep1 22,133,422 paired-ed 150
E-trunk_h3k4me3_rep1 50,430,196 paired-ed 150
Kidney_h3k4me3_rep1 3,922,484 paired-ed 150
Liver_h3k4me3_rep1 18,711,566 paired-ed 150
Muscle_h3k4me3_rep1 29,684,894 paired-ed 150
Skin_h3k4me3_rep1 58,024,592 paired-ed 150
Spleen_h3k4me3_rep1 18,813,056 paired-ed 150
Testis_h3k4me3_rep1 22,526,244 paired-ed 150
Brain_H3K9me3_rep1 13,902,969 paired-ed 150
Colon_H3K9me3_rep1 7,721,719 paired-ed 150
Blood_H3K9me3_rep1 10,840,250 paired-ed 150
Heart_H3K9me3_rep1 7,947,696 paired-ed 150
Intestine_H3K9me3_rep1 12,958,288 paired-ed 150
Kidney_H3K9me3_rep1 7,238,470 paired-ed 150
Liver_H3K9me3_rep1 12,298,856 paired-ed 150
Muscle_H3K9me3_rep1 14,540,724 paired-ed 150
Skin_H3K9me3_rep1 9,293,298 paired-ed 150
Spleen_H3K9me3_rep1 9,013,540 paired-ed 150
Testis_H3K9me3_rep1 9,339,132 paired-ed 150
Brain_ATAC_rep1 11,256,714 paired-ed 150
Colon_ATAC_rep1 64,407,490 paired-ed 150
Blood_ATAC_rep1 44,349,981 paired-ed 150
Heart_ATAC_rep1 13,318,779 paired-ed 150
Intestine_ATAC_rep1 21,043,459 paired-ed 150
Kidney_ATAC_rep1 21,561,836 paired-ed 150
Liver_ATAC_rep1 13,617,294 paired-ed 150
Muscle_ATAC_rep1 20,315,786 paired-ed 150
Skin_ATAC_rep1 14,112,784 paired-ed 150
Spleen_ATAC_rep1 7,587,349 paired-ed 150
April 2020
Testis_ATAC_rep1 10,244,549 paired-ed 150

Brain_WGBS_rep1 446,611,815 paired-ed 150
Colon_WGBS_rep1 421,379,280 paired-ed 150
Blood_WGBS_rep1 255,237,608 paired-ed 150
Heart_WGBS_rep1 403,339,595 paired-ed 150
Intestine_WGBS_rep1 409,562,435 paired-ed 150
Kidney_WGBS_rep1 429,903,275 paired-ed 150
Liver_WGBS_rep1 425,121,792 paired-ed 150
Muscle_WGBS_rep1 454,789,887 paired-ed 150
6
Skin_WGBS_rep1 429,245,446 paired-ed 150
Spleen_WGBS_rep1 409,400,141 paired-ed 150

Testis_WGBS_rep1 416,803,341 paired-ed 150
Brain_HiC_rep1 191,703,862 paired-ed 150
Muscle_HiC_rep1 323,085,950 paired-ed 150
Brain_RNA-seq_rep2 42,061,260 paired-ed 60
Blood_RNA-seq_rep2 32,282,328 paired-ed 60
Colon_RNA-seq_rep2 41,366,162 paired-ed 60
Heart_RNA-seq_rep2 56,548,432 paired-ed 60
Intestine_RNA-seq_rep2 63,100,907 single-end 56
E-brain_RNA-seq_rep2 42,438,841 paired-ed 99
E-trunk_RNA-seq_rep2 48,844,849 paired-ed 99
Kidney_RNA-seq_rep2 33,744,731 paired-ed 60
Liver_RNA-seq_rep2 38,926,981 paired-ed 60
Muscle_RNA-seq_rep2 40,352,658 paired-ed 51
Skin_RNA-seq_rep2 56,024,033 paired-ed 60
Spleen_RNA-seq_rep2 77,413,639 paired-ed 60
Testis_RNA-seq_rep2 41,712,778 paired-ed 60
Brain_h3K27ac_rep2 12,514,004 paired-ed 150
Colon_h3K27ac_rep2 28,798,332 paired-ed 150
Blood_h3K27ac_rep2 13,316,975 paired-ed 150
Heart_h3K27ac_rep2 1,470,040 paired-ed 150
Intestine_h3K27ac_rep2 1,043,600 paired-ed 150
E-brain_h3K27ac_rep2 35,534,782 paired-ed 150
E-trunk_h3K27ac_rep2 52,479,248 paired-ed 150
Liver_h3K27ac_rep2 19,622,870 paired-ed 150
Muscle_h3K27ac_rep2 18,465,766 paired-ed 150
Skin_h3K27ac_rep2 49,624,004 paired-ed 150
Spleen_h3K27ac_rep2 42,619,886 paired-ed 150
Testis_h3K27ac_rep2 10,459,980 paired-ed 150
Brain_h3k4me3_rep2 40,158,348 paired-ed 150
Colon_h3k4me3_rep2 2,213,328 paired-ed 150
Blood_h3k4me3_rep2 17,854,940 paired-ed 150
Heart_h3k4me3_rep2 40,874,600 paired-ed 150
Intestine_h3k4me3_rep2 3,381,112 paired-ed 150
E-brain_h3k4me3_rep2 35,017,716 paired-ed 150
E-trunk_h3k4me3_rep2 50,553,374 paired-ed 150
Kidney_h3k4me3_rep2 14,636,426 paired-ed 150
Liver_h3k4me3_rep2 25,704,652 paired-ed 150
Muscle_h3k4me3_rep2 9,398,482 paired-ed 150
Skin_h3k4me3_rep2 29,972,300 paired-ed 150
Spleen_h3k4me3_rep2 30,352,882 paired-ed 150
Testis_h3k4me3_rep2 31,154,908 paired-ed 150
Brain_ATAC_rep2 36,189,602 paired-ed 150
Colon_ATAC_rep2 41,366,162 paired-ed 150
April 2020
Blood_ATAC_rep2 44,609,138 paired-ed 150

Heart_ATAC_rep2 50,275,820 paired-ed 150
Intestine_ATAC_rep2 8,289,902 paired-ed 150
Kidney_ATAC_rep2 62,647,690 paired-ed 150
Liver_ATAC_rep2 46,710,702 paired-ed 150
Muscle_ATAC_rep2 10,111,831 paired-ed 150
Skin_ATAC_rep2 86,132,201 paired-ed 150
Spleen_ATAC_rep2 55,632,860 paired-ed 150
Testis_ATAC_rep2 56,153,663 paired-ed 150
7
Brain_HiC_rep2 546,389,510 paired-ed 150
Muscle_HiC_rep2 331,791,203 paired-ed 150

Brain_scATAC-seq_rep1 324,127,750 paired-ed 150
Brain_scATAC-seq_rep2 330,445,166 paired-ed 150
Antibodies Rabbit polyclonal histone H3K27ac antibody (Active Motif, 39133) Lot. 31814008 Dilution ratio. 1:100
Rabbit polyclonal histone H3K4me3 antibody (EMD Millipore, 07-473) clone MC135. Lot. 2591879 Dilution ratio. 1:100
Rabbit monoclonal histone H3K9me2 antibody (Cell Signaling, 4658) Lot. GR3247768-1 Dilution ratio. 1:100
Rabbit polyclonal histone H3K9me3 antibody (Abcam, ab8898) Lot. GR3247768-1 Dilution ratio.1:100
Peak calling parameters H3K27ac and H3K4me3: macs2 callpeak -f BED -n -p 1e-2 --nomodel --shift 0 --extsize 150 --keep-dup all -B --SPMR
ATAC-seq: macs2 callpeak --shift 75 --extsize 150_window --nomodel -B --SPMR --keep-dup all --call-summits
H3K9me3 and H3K9me2 : findPeaks -style histone -region -size 1000 -minDist 5000
Data quality 10496 blood_h3k9me3

20872 brain_h3k9me3
14715 colon_h3k9me3
15671 heart_h3k9me3
19356 intestine_h3k9me3
15291 kidney_h3k9me3
14825 liver_h3k9me3
7058 muscle_h3k9me3
13345 skin_h3k9me3
10774 spleen_h3k9me3
13164 testis_h3k9me3
12985 blood_h3k9me2
19872 brain_h3k9me2
13672 colon_h3k9me2
13258 heart_h3k9me2
13330 intestine_h3k9me2
8537 kidney_h3k9me2
11715 liver_h3k9me2
8876 muscle_h3k9me2
12237 skin_h3k9me2
10858 spleen_h3k9me2
13377 testis_h3k9me2
110482 blood.atac.fixed.500bp.most_significant.peak
142651 brain.atac.fixed.500bp.most_significant.peak
66771 colon.atac.fixed.500bp.most_significant.peak
137064 heart.atac.fixed.500bp.most_significant.peak
113787 intestine.atac.fixed.500bp.most_significant.peak
145525 kidney.atac.fixed.500bp.most_significant.peak
90895 liver.atac.fixed.500bp.most_significant.peak
105594 muscle.atac.fixed.500bp.most_significant.peak
180788 skin.atac.fixed.500bp.most_significant.peak
103231 spleen.atac.fixed.500bp.most_significant.peak
143452 testis.atac.fixed.500bp.most_significant.peak
26529 blood.h3k27ac.merge_x_ctl_for_rep1.peak.rpkm.filter_FC2_change_1
29767 brain.h3k27acxbrain.input.peak_region.rpkm.filter_FC2_change_1
38384 colon.h3k27acxcolon.input.peak_region.rpkm.filter_FC2_change_1
24990 e-brain.h3k27acxe-brain.input.peak_region.rpkm.filter_FC2_change_1
56391 e-muscle.h3k27acxe-muscle.input.peak_region.rpkm.filter_FC2_change_1
47199 heart.h3k27acxheart.input.peak_region.rpkm.filter_FC2_change_1
35349 intestine.h3k27acxintestine.input.peak_region.rpkm.filter_FC2_change_1
45105 kidney.h3k27acxkidney.to.h3k27ac.input.peak_region.rpkm.filter_FC2_change_1
52000 liver.h3k27acxliver.input.peak_region.rpkm.filter_FC2_change_1
74777 muscle.h3k27acxmuscle.input.peak_region.rpkm.filter_FC2_change_1
8724 skin.h3k27acxskin.input.peak_region.rpkm.filter_FC2_change_1
22015 spleen.h3k27acxspleen.to.h3k27ac.input.peak_region.rpkm.filter_FC2_change_1
24238 testis.h3k27acxtestis.input.peak_region.rpkm.filter_FC2_change_1
15521 blood.h3k4me3.merge_x_ctl_for_rep1.peak.rpkm.filter_FC2_change_1
21845 brain.h3k4me3xbrain.input.peak_region.rpkm.filter_FC2_change_1
24878 colon.h3k4me3xcolon.input.peak_region.rpkm.filter_FC2_change_1
36789 e-brain.h3k4me3xe-brain.input.peak_region.rpkm.filter_FC2_change_1
47734 e-muscle.h3k4me3xe-muscle.input.peak_region.rpkm.filter_FC2_change_1
April 2020
20386 heart.h3k4me3xheart.input.peak_region.rpkm.filter_FC2_change_1
22039 intestine.h3k4me3xintestine.input.peak_region.rpkm.filter_FC2_change_1
26275 kidney.h3k4me3xkidney.input.peak_region.rpkm.filter_FC2_change_1
21490 liver.h3k4me3xliver.input.peak_region.rpkm.filter_FC2_change_1
23063 muscle.h3k4me3xmuscle.input.peak_region.rpkm.filter_FC2_change_1
18720 skin.h3k4me3xskin.input.peak_region.rpkm.filter_FC2_change_1
20429 spleen.h3k4me3xspleen.to.h3k4me3.input.peak_region.rpkm.filter_FC2_change_1
32333 testis.h3k4me3xtestis.input.peak_region.rpkm.filter_FC2_change_1
13171 brain_merge_scATAC.10_peaks.narrowPeak
8

Software MACS2, Homer
April 2020
9
Article
Structure of LRRK2 in Parkinson’s disease

and model for microtubule interaction

https://doi.org/10.1038/s41586-020-2673-2 C. K. Deniston1,7,11, J. Salogiannis1,2,11, S. Mathea3,11, D. M. Snead1, I. Lahiri1,8, M. Matyszewski1,
O. Donosa2, R. Watanabe4,9, J. Böhning4,10, A. K. Shiau5,6, S. Knapp3, E. Villa4,
Received: 10 November 2019
S. L. Reck-Peterson1,2,6 ✉ & A. E. Leschziner1,4 ✉
Accepted: 12 August 2020
Published online: 19 August 2020

Leucine-rich repeat kinase 2 (LRRK2) is the most commonly mutated gene in familial
Check for updates Parkinson’s disease1 and is also linked to its idiopathic form2. LRRK2 has been
proposed to function in membrane trafficking3 and colocalizes with microtubules4.
Despite the fundamental importance of LRRK2 for understanding and treating
Parkinson’s disease, structural information on the enzyme is limited. Here we
report the structure of the catalytic half of LRRK2, and an atomic model of
microtubule-associated LRRK2 built using a reported cryo-electron tomography
in situ structure5. We propose that the conformation of the LRRK2 kinase domain
regulates its interactions with microtubules, with a closed conformation favouring
oligomerization on microtubules. We show that the catalytic half of LRRK2 is
sufficient for filament formation and blocks the motility of the microtubule-based
motors kinesin 1 and cytoplasmic dynein 1 in vitro. Kinase inhibitors that stabilize an
open conformation relieve this interference and reduce the formation of LRRK2
filaments in cells, whereas inhibitors that stabilize a closed conformation do not. Our
findings suggest that LRRK2 can act as a roadblock for microtubule-based motors and
have implications for the design of therapeutic LRRK2 kinase inhibitors.
LRRK2 is a large (288 kDa) multi-domain protein. Its amino-terminal half

consists of repetitive protein interaction motifs (armadillo, ankyrin, and Structure of the catalytic half of LRRK2
leucine-rich repeats), and its carboxy-terminal catalytic half contains High-resolution studies on human LRRK2 have been limited by the lack
a Ras-like GTPase (Ras-of-complex (ROC) domain), a kinase domain, of efficient expression systems that yield stable protein. We tested many
and two other domains (C-terminal of ROC (COR), and WD40) (Fig. 1a). constructs (Extended Data Fig. 1a) and identified one that consisted
High-resolution structural data on LRRK2 is limited to bacterial homo- of the C-terminal half of wild-type LRRK2 (amino acids 1327–2527),
logues6,7 or isolated domains8,9, whereas low-resolution full-length which resulted in robust insect cell expression and well-behaved pro-
protein structures have been obtained using negative-stain electron tein (Extended Data Fig. 1b, c). This construct consists of the ROC,
microscopy10 and cryo-electron microscopy (cryo-EM)11. A recent study COR, kinase and WD40 domains of LRRK2 (Fig. 1a), which we refer to
of LRRK2 bound to microtubules in cells using cryo-electron tomog- as LRRK2RCKW. The COR domain was previously defined as consisting
raphy (cryo-ET) and subtomogram analysis led to a 14 Å structure and of two subdomains, COR-A and COR-B6.
a proposed model of the catalytic half of LRRK25. We determined a 3.5 Å structure of LRRK2RCKW in the presence of
The interaction of LRRK2 with microtubules is linked to disease GDP using cryo-EM (Fig. 1b, c, Extended Data Figs. 1, 2). On our grids,
as four of the five major mutations that cause Parkinson’s disease1 we observed a mixture of monomers, dimers and head-to-tail trim-
(Fig. 1a) enhance the microtubule association of LRRK212. Furthermore, ers; we used the trimer to solve the structure (Fig. 1b, Extended Data
Rab GTPases, which mark membranous cargos that move along micro- Figs. 1, 2). Although crucial for reaching high resolution, the trimer
tubules, are physiological LRRK2 kinase substrates13,14. These membra- is probably specific to the cryo-EM grid preparation as LRRK2RCKW is
nous cargos, and others implicated in Parkinson’s disease pathology3, predominantly monomeric, with a smaller percentage of dimers, in
are transported by the microtubule-based motors dynein and kinesin. solution (Extended Data Fig. 2n), and we only observed the trimer when
Here, we set out to determine a high-resolution structure of the cata- preparing grids with high concentrations of LRRK2RCKW (Methods).
lytic half of LRRK2 using cryo-EM, and to understand how it interacts Owing to flexibility, the ROC and COR-A domains were lower resolu-
with microtubules and how this affects microtubule-based motor tion in our structure (Fig. 1b, c). We improved this part of the map using
movement. signal subtraction and focused 3D classification and refinement (Fig. 1d,
Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA. 2Howard Hughes Medical Institute, Chevy Chase, MD, USA. 3Institute of Pharmaceutical
1
Chemistry, Goethe-Universität, Frankfurt, Germany. 4Division of Biological Sciences, Molecular Biology Section, University of California San Diego, La Jolla, CA, USA. 5Small Molecule Discovery
Program, Ludwig Institute for Cancer Research, La Jolla, CA, USA. 6Division of Biological Sciences, Cell and Developmental Biology Section, University of California San Diego, La Jolla, CA,
USA. 7Present address: Genomics Institute of the Novartis Research Foundation, La Jolla, CA, USA. 8Present address: Department of Biological Sciences, Indian Institute of Science Education
and Research Mohali, Mohali, India. 9Present address: La Jolla Institute for Immunology, La Jolla, CA, USA. 10Present address: Sir William Dunn School of Pathology, Oxford University, Oxford,
UK. 11These authors contributed equally: C. K. Deniston, J. Salogiannis, S. Mathea. ✉e-mail: sreckpeterson@ucsd.edu; aleschziner@ucsd.edu

a LRRK2RCKW f g h Parkinson’s and
ANK LRR ROC COR Kinase WD40 Crohn’s mutations
COR-B
2527 ROC Y1699
1327
R1441C/G Y1699C G2019S I2020T N2081D 180°
GDP
Parkinson’s Crohn’s R1441
disease disease G2019
[
[I2020]
]
b d
N-lobe
Kinase
N2081
Å
3
C-lobe
i j Electrostatic
rostatic Hydrophobic
4 int actions
interactions interactions
5
C2139 H1977 P1930 H1929 H1928

90°
COR-A
L2500 E2503 L2507 H2510 R2514 M2521

6
c e j
N2422 H2502 E2503 H2510 R2514
R1973 H1977 D1980

7
C-terminal
terminal
100°
00°
(WD40)
Kinase
helix
30°
C
WD40
Fig. 1 | Cryo-EM structure of LRRK2RCKW. a, Schematic of the construct used in with improved resolution for the ROC and COR-A domains. f, Ribbon diagram
this study. The N-terminal half of LRRK2, absent from our construct, is shown in of the atomic model of LRRK2RCKW. g, A 8.1 Å cryo-EM map of monomeric
dim colours. The same colour-coding of domains is used throughout the LRRK2RCKW with the model in f docked in. h, Location of the Parkinson’s and
Article. The five major familial mutations in Parkinson’s disease and a mutation Crohn’s disease mutations listed in a. i, j, Interface between the C-terminal
linked to Crohn’s disease are indicated. b, c, A 3.5 Å cryo-EM map (b) and local helix and the kinase domain in LRRK2RCKW with residues involved in
resolution (c) of the LRRK2RCKW trimer, with one monomer highlighted. electrostatic and hydrophobic interactions indicated.
d, e, A 3.8 Å cryo-EM map (d) and local resolution (e) of a LRRK2RCKW monomer
e, Extended Data Fig. 2a–d). The final model was generated using the Video 3). Although several other kinases have α-helices in the same gen-
signal-subtracted maps of the ROC and COR-A domains, and then com- eral location, none form interactions as extensive as those observed in
bined with the COR-B, kinase and WD40 domains from the trimer map LRRK2 (Fig. 1i, j, Extended Data Fig. 3d–i). Deletion of this helix resulted
(Fig. 1f, Extended Data Fig. 2e–m, Supplementary Video 1). Our model in an insoluble protein (Extended Data Fig. 1a, b). A residue near its end
fits well into an 8.1 Å reconstruction we obtained of a LRRK2RCKW mono- (T2524) is a known phosphorylation site for LRRK220. Owing to the close
mer (Fig. 1g, Extended Data Fig. 6), which indicates that trimer forma- proximity between T2524 and the N-lobe of the kinase domain, as well
tion does not cause major structural changes in the protein. as the adjacent COR-B domain, we hypothesize that phosphorylation
LRRK2RCKW adopts an overall J-shape, with the WD40, kinase and of this residue may be involved in regulation of the kinase. Because the
COR-B domains arranged along one axis, and COR-A and ROC turning last two residues of the C-terminal helix are disordered in our structure,
around back towards the kinase. This brings the COR-A and the tightly as is a neighbouring loop in COR-B, it is possible that conditions exist
associated ROC domain into close proximity to the kinase C-lobe (Fig. 1f, in which these regions become ordered and turn the C-terminal helix
Supplementary Video 1). This arrangement probably underpins the into a scaffolding element that connects COR-B, the kinase and the
crosstalk between the LRRK2 kinase and GTPase15,16. Part of the FERM WD40 domains.
domain in the FAK–FERM complex approaches the FAK C-lobe in a We modelled the leucine-rich repeats (LRR) into LRRK2RCKW by using
similar way17 (Extended Data Fig. 3a, b). The ROC, COR-A and COR-B a crystal structure of the LRR, ROC and COR domains of the Chlorobium
domains are arranged as seen in crystal structures of LRRK2 bacte- tepidum Roco protein7 (Extended Data Fig. 3l–p). In our model, the
rial homologues6,7,18. The N-lobe of the kinase domain, in particular LRR wraps around the N-lobe of the kinase and approaches the C-lobe,
its αC helix, forms an extensive interaction with the COR-B domain, placing the known S1292 autophosphorylation site in the LRR close to
with COR-B occupying a location similar to cyclin A in CDK2–cyclin the active site of the kinase, and the Crohn’s-disease-associated residue
A19 (Extended Data Fig. 3a, c). N208121, located in the kinase C-lobe, next to the LRR (Extended Data
The kinase in our LRRK2RCKW structure is in an open, inactive confor- Fig. 3q), suggesting the functional relevance of this predicted interface.
mation. Its activation loop contains the site of two familial mutations
found in Parkinson’s disease (G2019S and I2020T) and is disordered
beyond G2019 (Fig. 1h, Extended Data Fig. 2h, Supplementary Video 2). Model of microtubule-bound filaments
R1441 and Y1699 are the sites of three other familial Parkinson’s disease A 14 Å structure of microtubule-associated filaments of full-length
mutations and are located at the ROC and COR-B interface (Fig. 1h, LRRK2 (carrying the filament-promoting I2020T mutation12) was
Extended Data Fig. 2j, Supplementary Video 2). Because the kinase and recently determined using in situ cryo-ET and subtomogram anal-
GTPase interact with each other via the COR-A domain, it is possible ysis5 (Fig. 2a). The LRRK2 filaments formed on microtubules are
that these mutations, located at the interface between the GTPase right-handed5. Because microtubules are left-handed and no strong
and COR-B, alter the conformational landscape of LRRK2 in response density connected the LRRK2 filament to the microtubule surface5,
to ligands and/or regulatory signals and therefore affect the crosstalk it is not known whether the LRRK2 microtubule interaction is direct.
between the LRRK2 catalytic domains. To address this, we combined purified microtubules and LRRK2RCKW,
A unique feature of LRRK2 is a 28-amino-acid α-helix located at its either wild type or I2020T, and imaged them by cryo-EM. Both wild-type
extreme C terminus, after the WD40 domain (Fig. 1i, j, Supplementary and I2020T mutant LRRK2RCKW bound to microtubules, and diffraction

Article
a c Open f Closed
b d g
LRRK2RCKW LRRK2RCKW
Microtubule (WT) (I2020T)
Image
Diffraction
pattern
h i
Kinase WD40
WD40
WD40
WD40
Open
Closed
COR
COR
Fig. 2 | Modelling the microtubule-associated LRRK2 filaments. a, A 14 Å circle highlights the filament interface mediated by interactions between COR
cryo-ET map of a segment of microtubule-associated LRRK2 filament domains, where clashes are found. e, Superposition of the LRRK2RCKW structure
in cells. The microtubule is shown in blue and the LRRK2 filament in grey. (coloured by domains) and a model of LRRK2RCKW with its kinase in a closed
b, Microtubule-associated LRRK2RCKW filaments reconstituted in vitro from conformation in blue. The dashed blue arrow indicates the closing of the
purified components. Top, single cryo-EM images of a naked microtubule kinase. f, Fitting of the closed-kinase model of LRRK2RCKW into the cryo-ET map.
(left), and wild-type (WT) (centre) and I2020T (right) LRRK2RCKW filaments. g, Atomic model of the closed-kinase LRRK2RCKW filaments from f with a white
Bottom, diffraction patterns (power spectra) calculated from the images circle highlighting the same interface as in d. h, i, Cartoon representation of the
above. Filled and open arrowheads indicate the layer lines corresponding to the two filament models, highlighting the clashes observed with open-kinase
microtubule and LRRK2RCKW, respectively. Scale bar, 20 nm. c, Fitting of the LRRK2RCKW (h) and resolved with the closed-kinase model (i). In total, 82% of
LRRK2RCKW structure, which has its kinase in an open conformation, into the clashes were resolved using the closed-kinase LRRK2RCKW model (Methods).
cryo-ET map. d, Atomic model of the LRRK2RCKW filaments from c. The white
patterns calculated from the images showed layer lines consistent with LRRK2 kinase to be in a closed conformation. To test this, we modelled
the formation of ordered filaments (Fig. 2b). Therefore, the interaction a kinase-closed LRRK2RCKW (Fig. 2e, Extended Data Fig. 4g–j) and used
between LRRK2 and microtubules is direct and the catalytic C-terminal it to rebuild the LRRK2 filament. The kinase-closed LRRK2RCKW model
half of LRRK2 is sufficient for the formation of microtubule-associated resolved more than 80% of the backbone clashes we had observed
filaments. The layer line patterns of wild-type and I2020T mutant with our kinase-open LRRK2RCKW structure (Fig. 2c, d, f, g). A closed
LRRK2RCKW are different, with the I2020T diffraction pattern having an conformation for the kinase was also proposed by the integrative
additional layer line of lower frequency, which indicates longer-range modelling5. Given these data, we hypothesize that the conformation
order in the filaments (Fig. 2b). This is consistent with the observation of LRRK2 controls its ability to oligomerize on microtubules, with a
that the I2020T mutation promotes microtubule association by LRRK2 closed kinase promoting oligomerization and an open (inactive) one
in cells12. Understanding the structural basis for this effect will require disfavouring it (Fig. 2h, i).
high-resolution structures of the filaments formed by wild-type and The LRRK2 filaments in our kinase-closed model are formed by two
I2020T mutant LRRK2. homotypic interactions: one is mediated by the WD40 domain and the
Integrative modelling was previously used to build a model into the other by the COR-A and COR-B domains (Fig. 3a–d). Similar interfaces
in situ structure of microtubule-associated LRRK25. This modelling indi- were reported on the basis of the cryo-ET structure5. We also solved
cated that the well-resolved cryo-ET density closest to the microtubule structures of LRRK2RCKW dimers, using the same grids that yielded
consisted of the ROC, COR, kinase and WD40 domains and gave orienta- the 3.5 Å structure of LRRK2RCKW. We obtained structures of both the
tion ensembles for each domain5 that are in good agreement with our WD40–WD40- and COR–COR-mediated dimers, which indicates that
high-resolution structure of LRRK2RCKW (Extended Data Fig. 4a). Here, both interfaces mediate dimerization in the absence of microtubules
we built an atomic model of the microtubule-bound LRRK2 filaments (Fig. 3e, f, Extended Data Figs. 5, 6). The interface in the COR-mediated
by combining our 3.5 Å structure of LRRK2RCKW with the 14 Å in situ dimer of LRRK2RCKW differs from that reported for the C. tepidum Roco
structure of microtubule-associated LRRK2 (Extended Data Fig. 4b–f). protein6,7; although the GTPase domains interact directly in the dimer of
This showed that the LRRK2RCKW structure is sufficient to account for the bacterial protein7, they are not involved in the dimerization interface
the density seen in the in situ structure (Fig. 2c), in agreement with our we observed for LRRK2 (Extended Data Fig. 6c).
ability to reconstitute microtubule-associated LRRK2RCKW filaments We built an independent model of a closed-kinase LRRK2RCKW by
in vitro (Fig. 2b), and with the earlier modelling5. splitting our 3.5 Å structure in half at the junction between the N- and
Although our LRRK2RCKW structure fits the overall shape of the cryo-ET C-lobes of the kinase, and fitting the fragments into our cryo-EM map
map, there were notable clashes at the COR domain interfaces (Fig. 2d). of a WD40–WD40 dimer obtained in the presence of a LRRK2-specific
Because the kinase in our LRRK2RCKW structure is in an open confor- type I kinase inhibitor MLi-222,23, which is predicted to stabilize the
mation, we hypothesized that filament formation might require the closed conformation of the kinase (Extended Data Fig. 7a–c).

a WD40-mediated b c COR-mediated d g
LRRK2RCKW dimer LRRK2RCKW dimer
h
e f
Apo MLi-2 Apo MLi-2 90°
90° 90°
90° 90°
Fig. 3 | LRRK2RCKW forms WD40- and COR-mediated dimers outside the shows the cryo-EM map and the bottom row a transparent version of it with a
filaments. a–d, The filament model shown in Fig. 2h, i is shown here in grey, model docked in. g, Molecular models of the WD40-mediated and
with either a WD40-mediated (a), or COR-mediated (c) LRRK2RCKW dimer COR-mediated LRRK2RCKW dimers obtained in the presence of MLi-2 (e, f) were
highlighted with domain colours. The corresponding molecular models are aligned in alternating order. This panel shows the resulting right-handed helix.
shown next to the cartoons (b, d). e, f, Cryo-EM reconstructions of LRRK2RCKW h, The helix has dimensions that are compatible with the diameter of a
dimers obtained in the absence of inhibitor (‘apo’), or in the presence of MLi-2. 12-protofilament microtubule (EMD-5192)44, which was the species used to
For each reconstruction, two orientations of the map are shown: down the obtain the cryo-ET map shown in Fig. 2a5, and has its ROC domains pointing
two-fold axis at the dimerization interface (left), which matches the orientation towards the microtubule surface.
of the models shown in b and d, and perpendicular to it (right). The top row
We then docked this closed-kinase model (Extended Data Fig. 7c) into microtubule-associated proteins, such as MAP2 and Tau, also inhibit
the cryo-EM maps of WD40- and COR-mediated dimers obtained in kinesin, but not dynein26,27, probably owing to the ability of dynein to
the presence of MLi-2 to generate molecular models of both dimers side-step on the microtubule28–30. The unusual ability of LRRK2 to inhibit
(Extended Data Fig. 7d, e). We aligned these models to build a polymer dynein may be a consequence of it forming oligomers that cannot be
in silico. This resulted in a right-handed helix with the same general overcome by sidestepping.
geometric properties seen in the cellular LRRK2 filaments, which We also tested the inhibition of kinesin by I2020T mutant LRRK2RCKW,
indicates that those properties are largely encoded in the structure which promotes the formation of filaments when overexpressed in
of LRRK2RCKW itself (Fig. 3g, h, Extended Data Fig. 7f). Docking the cells12. I2020T mutant LRRK2RCKW inhibited kinesin to a similar extent
same two halves of LRRK2RCKW into the cryo-EM map of a monomer we as wild-type LRRK2RCKW (Extended Data Fig. 8c, d). Because the in vitro
obtained in the absence of inhibitors or ATP led to a structure simi- reconstituted filaments of I2020T mutant LRRK2RCKW show longer range
lar to our 3.5 Å structure obtained from trimers, further confirming
that trimer formation does not alter the conformation of LRRK2RCKW
(Extended Data Fig. 7g, h). a Dynein-TMR/dynactin/ b c
100 100
(% per microtubule)
(% per microtubule)
These data, along with the apparent lack of any residue-specific Kinesin-GFP
Kines ninein-like
Kinesin motility
Dynein motility
80 80
interactions between LRRK2 and the microtubule lattice, suggest that 60 60
****
the microtubule may provide a surface for LRRK2 to oligomerize on Microtubule 40 40
using interfaces that exist in solution, therefore explaining the sym- – + 20 **** 20
****
metry mismatch between the microtubule and the LRRK2 filament. Streptavidin 0 0
Biotin 0 6.25 12.50 25.00 0 25
Consistent with this, the surface charge of the microtubule facing the LRRK2RCKW [nM] LRRK2RCKW [nM]
LRRK2RCKW filament is acidic, whereas there are basic patches on the d e f
Kinesin Kinesin Dynein
LRRK2RCKW filament that face the microtubule (Extended Data Fig. 7i–l). 1.0 1.5 1.0
Relative frequency
Relative frequency
Velocity (μm s–1)
The unstructured C-terminal tails of α- and β-tubulin, which were not 1.0
****
included in the surface charge calculations, are also acidic. 0.5 [LRRK2RCKW], tau 0.5
0 nM, 1.667 μm
6.25 nM,1.570 μm 0.5 [LRRK2RCKW], tau
12.5 nM, 1.048 μm 0 nM, 4.980 μm
25 nM, 0.813 μm 25 nM, 0.846 μm
0.0
LRRK2RCKW inhibits kinesin and dynein 0.0
0 5 10 15
0.0
0 6.25 12.50 25.00 0 20 40
Run length (μm) LRRK2RCKW [nM] Run length (μm)
To test our hypothesis that the conformation of the LRRK2 kinase
domain regulates its interaction with microtubules, we needed a sen- Fig. 4 | LRRK2RCKW inhibits the motility of kinesin and dynein. a, Schematic of
sitive assay to measure the association of LRRK2RCKW with microtubules the single-molecule motility assay. b, c, The percentage (mean ± s.d.) of motile
and a means to control the conformation of its kinase. We monitored events per microtubule as a function of LRRK2RCKW concentration for kinesin (b)
microtubule association by measuring the effect of LRRK2RCKW on and dynein (c). ****P < 0.0001, Kruskal–Wallis test with Dunn’s post hoc for
microtubule-based motor motility. We used a truncated human kine- multiple comparisons for (b) or Mann–Whitney test (c). d, Cumulative frequency
sin 1, KIF5B (‘kinesin’)24, which moves towards the microtubule plus end, distribution of kinesin run lengths as a function of LRRK2RCKW concentration.
and the activated human cytoplasmic dynein-1–dynactin–ninein-like Mean decay constants (tau) are shown. The 12.5 nM and 25 nM, but not 6.25 nM,
conditions were significantly different (P < 0.0001) than the 0 nM condition
complex (‘dynein’)25, which moves in the opposite direction. Using
(one-way analysis of variance (ANOVA) with Dunnett’s test for multiple
single-molecule in vitro motility assays (Fig. 4a), we found that low
comparisons using error generated from a bootstrapping analysis). e, Velocity of
nanomolar concentrations of LRRK2RCKW inhibited the movement
kinesin as a function of LRRK2RCKW concentration. Data are mean ± s.d.
of both kinesin and dynein, with near complete inhibition at 25 nM ****P < 0.0001, one-way ANOVA with Dunn’s post hoc for multiple comparisons.
LRRK2RCKW (Fig. 4b, c, Extended Data Fig. 8a, b). We hypothesized that f, Cumulative frequency distribution of dynein run lengths as a function of
LRRK2RCKW was acting as a roadblock for the motors. In agreement LRRK2RCKW concentration. Mean decay constants (tau) are shown. Data were
with this, the distance that single kinesins moved (run length) was resampled with bootstrapping analysis and were significant. P < 0.0001,
reduced (Fig. 4d), whereas their velocity remained relatively con- unpaired t-test with Welch’s correction using error generated from a
stant (Fig. 4e). We obtained similar results with dynein (Fig. 4f). Other bootstrapping analysis.

Article
a e
DMSO Ponatinib GZD-824 MLi-2 LRRK2-IN-1 Kinase
Kinesin motility per MT (%)

100
Open
80
60 COR
40 ****
****
20
******** ******** WD40
0 WD40
0 25 50 0 25 50 0 25 50 0 25 50 0 25 50 WD40
LRRK2RCKW [nM] Kinase
b c d COR
Closed
DMSO Pon. GZD MLi-2 IN-1 COR
Dynein motility per MT (%)
100 60 50
**** COR
Cells with LRRK2
Cells with LRRK2

80 40

filaments (%)
filaments (%)
40
60 30
20

40 20 WD40
20 *** **** 10
****
0 0 0
SO
M O
0 25 0 25 0 25 0 25 0 25
-2
10
5
S
GZD (μM)
Li
DM
DM
LRRK2RCKW [nM]
Fig. 5 | Type II, but not type I, kinase inhibitors rescue kinesin and dynein of wild-type GFP–LRRK2 filaments in 293T cells. Data are mean ± s.d.
motility and reduce LRRK2 filament formation in cells. a, b, Effects of ****P = 0.0002, Mann–Whitney test. d, Treatment with GZD-824 (5 μM) for
different kinase inhibitors on LRRK2RCKW’s inhibition of kinesin (a) and dynein 30 min decreases the formation of GFP–LRRK2(I2020T) filaments in 293 cells.
(b) motility. Data shown are the percentage of motile events per microtubule Data are mean ± s.d. *P = 0.0133, **P = 0.0012, Kruskal–Wallis with Dunn’s
(MT) as a function of LRRK2RCKW concentration in the absence (DMSO) or post hoc test for multiple comparisons. e, Schematic of our hypothesis. The
presence of the indicated inhibitors (ponatinib (Pon.) and GZD-824 (GZD): LRRK2 kinase can be in an open or closed conformation. The different species
10 μM; MLi-2 and LRRK2-IN-1 (IN-1): 1 μM). Data are mean ± s.d. ***P < 0.001, we observed are represented in the rounded rectangles, but only monomers
****P < 0.0001, Kruskal–Wallis test with Dunn’s post hoc for multiple are shown on the microtubule for simplicity. Our model proposes that the
comparisons within drug only. DMSO conditions reproduced from Fig. 4c for kinase-closed form of LRRK2 favours oligomerization on microtubules.
comparison. c, Treatment with MLi-2 (500 nM) for 2 h increases the formation
order than wild-type LRRK2RCKW (Fig. 2b), it is possible that the high

sensitivity of the single-molecule motility assays does not allow us to GZD-824 reduces filaments in cells
distinguish differences in oligomer length and/or stability between In cells expressing high levels of LRRK2, LRRK2 forms filaments that
wild-type and I2020T mutant LRRK2RCKW. colocalize with a subset of microtubules and are sensitive to the
microtubule-depolymerizing drug nocodazole12. This association is
enhanced by the Parkinson’s disease-linked mutations R1441C, R1441G,
Type II inhibitors rescue the motors Y1699C and I2020T12,36 and by type I kinase inhibitors34,37. We tested
Our hypothesis predicts that the closed conformation of the LRRK2 our kinase conformation hypothesis in human 293T cells by determin-
kinase domain will favour the oligomerization of LRRK2 on micro- ing whether type I and type II kinase inhibitors had opposite effects
tubules. By contrast, it predicts that conditions that stabilize the on the formation of microtubule-associated LRRK2 filaments in cells.
kinase in an open conformation will prevent oligomerization and Consistent with previous findings37,38, the type I inhibitor MLi-2
therefore decrease microtubule binding by LRRK2RCKW, resulting in enhanced the microtubule association of LRRK2 (Fig. 5c), which
the relief of LRRK2RCKW-dependent inhibition of kinesin and dynein suggests that the closed conformation of the kinase favours bind-
motility. To test these predictions, we searched for a type II kinase ing to microtubules in cells. By contrast, the type II inhibitor
inhibitor that binds tightly to LRRK2 with structural evidence that GZD-824 reduced the filament-forming ability of overexpressed
it stabilizes an open kinase conformation. We selected ponatinib, LRRK2(I2020T) (Fig. 5d). This reduction in LRRK2 filament formation
which has an inhibition constant (Ki) for LRRK2 of 31 nM31, and crystal was not due to changes in the amount of LRRK2 protein expression or
structures show it bound to RIPK232 and IRAK4 (Protein Data Bank the overall architecture of the microtubule cytoskeleton (Extended
(PDB) code 6EG9) in open conformations (Extended Data Fig. 9a). Data Fig. 9l–n).
Ponatinib inhibited the ability of LRRK2RCKW to phosphorylate Rab8A13
(Extended Data Fig. 9f).
As predicted, ponatinib rescued kinesin motility in a dose-dependent Conclusions
manner at concentrations of LRRK2RCKW (25 nM) that had resulted in Here we reported the 3.5 Å structure of the catalytic half of LRRK2
almost complete inhibition of kinesin (Fig. 5a, Extended Data Fig. 9g–j). and used it, in combination with a 14 Å cryo-ET structure of
We observed similar effects with GZD-824, a chemically related type II microtubule-associated LRRK2 filaments5, to build an atomic model
kinase inhibitor33 (Fig. 5a, Extended Data Fig. 9i, j). Our hypothesis also of these filaments. This modelling led us to hypothesize that the confor-
predicted that kinase inhibitors that stabilize the closed form of the mation of the LRRK2 kinase controls its association with microtubules
kinase would not rescue the motors and might enhance the inhibitory (Fig. 5e). Cryo-EM structures of dimers of LRRK2RCKW obtained in the
effect of LRRK2RCKW by increasing its ability to form filaments on micro- absence of microtubules showed that the same interfaces are involved
tubules. Indeed, MLi-222,23 and another LRRK2-specific type I inhibitor, in both dimerization and filament formation, and aligning them in silico
LRRK2-IN-134, which are both expected22,35 to stabilize a closed confor- resulted in a right-handed filament with similar properties to LRRK2
mation of the kinase (Extended Data Fig. 9b–e), further enhanced the filaments observed in cells, which suggests that the ability of LRRK2
inhibitory activity of LRRK2RCKW on kinesin (Fig. 5a). Dynein motility was to form filaments is a property inherent to LRRK2, and specifically the
also rescued by ponatinib and GZD-824, but not by MLi-2 or LRRK2-IN-1 RCKW domains. We propose that both the surface charge and geometric
(Fig. 5b, Extended Data Fig. 9i, k). Similar to ponatinib, GZD-824, MLi-2 complementarity between the microtubule and LRRK2 promote the
and LRRK2-IN-1 inhibited the phosphorylation of Rab8A by LRRK2RCKW formation of LRRK2 filaments. It remains to be determined whether
(Extended Data Fig. 9f). LRRK2RCKW monomers or dimers are the minimal filament-forming unit.

We tested our model that the conformation of the LRRK2 kinase 11. Sejwal, K. et al. Cryo-EM analysis of homodimeric full-length LRRK2 and LRRK1 protein
complexes. Sci. Rep. 7, 8667 (2017).
regulates microtubule association using kinase inhibitors expected 12. Kett, L. R. et al. LRRK2 Parkinson disease mutations enhance its microtubule association.
to stabilize either the open (type II) or closed (type I) conformations Hum. Mol. Genet. 21, 890–899 (2012).
of the kinase. In support of our model, type II inhibitors relieved the 13. Steger, M. et al. Phosphoproteomics reveals that Parkinson’s disease kinase LRRK2
regulates a subset of Rab GTPases. eLife 5, e12813 (2016).
LRRK2RCKW-dependent inhibition of kinesin and dynein and reduced the 14. Steger, M. et al. Systematic proteomic analysis of LRRK2-mediated Rab GTPase
formation of LRRK2 filaments in cells, whereas type I inhibitors were phosphorylation establishes a connection to ciliogenesis. eLife 6, e31012 (2017).
unable to rescue motor motility and enhanced filament formation in 15. Ito, G. et al. GTP binding is essential to the protein kinase activity of LRRK2, a
causative gene product for familial Parkinson’s disease. Biochemistry 46, 1380–1388
cells. Notably, our single-molecule motility assays showed that low (2007).
nanomolar concentrations of LRRK2RCKW negatively affect both kinesin 16. West, A. B. et al. Parkinson’s disease-associated mutations in LRRK2 link enhanced
and dynein. At these low concentrations, it is likely that LRRK2 would GTP-binding and kinase activities to neuronal toxicity. Hum. Mol. Genet. 16, 223–232
(2007).
not form the long, highly ordered filaments and microtubule bundles 17. Lietha, D. et al. Structural basis for the autoinhibition of focal adhesion kinase. Cell 129,
observed in cells that overexpress the protein; instead, we hypothesize 1177–1187 (2007).
that at lower expression levels in cells LRRK2 forms short oligomers 18. Terheyden, S., Ho, F. Y., Gilsbach, B. K., Wittinghofer, A. & Kortholt, A. Revisiting the Roco
G-protein cycle. Biochem. J. 465, 139–147 (2015).
on microtubules. 19. Cheng, K.-Y. et al. The role of the phospho-CDK2/cyclin A recruitment site in substrate
The physiological role of non-pathogenic microtubule-associated recognition. J. Biol. Chem. 281, 23167–23179 (2006).
LRRK2 is not known. Our data show that LRRK2 acts as a roadblock 20. Pungaliya, P. P. et al. Identification and characterization of a leucine-rich repeat kinase 2
(LRRK2) consensus phosphorylation motif. PLoS ONE 5, e13672 (2010).
for microtubule-based motors in vitro. In cells, dynein and kine- 21. Hui, K. Y. et al. Functional variants in the LRRK2 gene confer shared effects on risk for
sin bind directly or indirectly to many Rab-marked membranous Crohn’s disease and Parkinson’s disease. Sci. Transl. Med. 10, eaai7795 (2018).
cargos39–43. Our data also show that the microtubule-associated form 22. Scott, J. D. et al. Discovery of a 3-(4-pyrimidinyl) indazole (MLi-2), an orally available and
selective leucine-rich repeat kinase 2 (LRRK2) inhibitor that reduces brain kinase activity.
of LRRK2 has its kinase in a closed (and potentially active) confor- J. Med. Chem. 60, 2983–2992 (2017).
mation. Microtubule-associated LRRK2 stalling of kinesin or dynein 23. Fell, M. J. et al. MLi-2, a potent, selective, and centrally active compound for exploring the
could therefore increase the likelihood that LRRK2 phosphorylates therapeutic potential and safety of LRRK2 kinase inhibition. J. Pharmacol. Exp. Ther. 355,
397–409 (2015).
cargo-associated Rab proteins13, modulating effector binding13 and 24. Case, R. B., Pierce, D. W., Hom-Booher, N., Hart, C. L. & Vale, R. D. The directional
resulting in changes in cargo dynamics. In support of this, the four preference of kinesin motors is specified by an element outside of the motor catalytic
domain. Cell 90, 959–966 (1997).
mutations associated with Parkinson’s disease that enhance LRRK2
25. Redwine, W. B. et al. The human cytoplasmic dynein interactome reveals novel activators
microtubule binding12 also show higher levels of Rab phosphorylation of motility. eLife 6, e28257 (2017).
in cells than the G2019S mutant13,14, which does not show enhanced 26. Dixit, R., Ross, J. L., Goldman, Y. E. & Holzbaur, E. L. F. Differential regulation of dynein and
kinesin motor proteins by tau. Science 319, 1086–1089 (2008).
microtubule binding compared with wild-type LRRK212.
27. Monroy, B. Y. et al. A combinatorial MAP Code dictates polarized microtubule transport.
Our data have important implications for the design of LRRK2 kinase Dev. Cell 53, 60–72.e4 (2020).
inhibitors for therapeutic purposes. We predict that inhibitors that 28. Reck-Peterson, S. L. et al. Single-molecule analysis of dynein processivity and stepping
behavior. Cell 126, 335–348 (2006).
favour the closed conformation of the kinase will promote LRRK2 fila-
29. Qiu, W. et al. Dynein achieves processive motion using both stochastic and coordinated
ment formation, and therefore could block microtubule-based traf- stepping. Nat. Struct. Mol. Biol. 19, 193–200 (2012).
ficking, whereas inhibitors that favour an open conformation of the 30. DeWitt, M. A., Chang, A. Y., Combs, P. A. & Yildiz, A. Cytoplasmic dynein moves through
uncoordinated stepping of the AAA+ ring domains. Science 335, 221–225 (2012).
kinase will not. These results should be taken into account to enhance
31. Liu, M. et al. Type II kinase inhibitors show an unexpected inhibition mode against
the therapeutically beneficial effects of LRRK2 kinase inhibition and Parkinson’s disease-linked LRRK2 mutant G2019S. Biochemistry 52, 1725–1736 (2013).
to avoid potential unintended side effects. 32. Canning, P. et al. Inflammatory signaling by NOD-ripk2 is inhibited by clinically relevant
type II kinase inhibitors. Chem. Biol. 22, 1174–1184 (2015).
33. Ren, X. et al. Identification of GZD824 as an orally bioavailable inhibitor that targets
phosphorylated and nonphosphorylated breakpoint cluster region-Abelson (Bcr-Abl)
Online content kinase and overcomes clinically acquired mutation-induced resistance against imatinib.
J. Med. Chem. 56, 879–894 (2013).
Any methods, additional references, Nature Research reporting sum- 34. Deng, X. et al. Characterization of a selective inhibitor of the Parkinson’s disease kinase
maries, source data, extended data, supplementary information, LRRK2. Nat. Chem. Biol. 7, 203–205 (2011).
acknowledgements, peer review information; details of author con- 35. Gilsbach, B. K. et al. Structural characterization of LRRK2 inhibitors. J. Med. Chem. 58,
3751–3756 (2015).
tributions and competing interests; and statements of data and code 36. Godena, V. K. et al. Increasing microtubule acetylation rescues axonal transport and
availability are available at https://doi.org/10.1038/s41586-020-2673-2. locomotor deficits caused by LRRK2 Roc-COR domain mutations. Nat. Commun. 5,
5245 (2014).
37. Blanca Ramírez, M. et al. GTP binding regulates cellular localization of Parkinson’s
1. Monfrini, E. & Di Fonzo, A. Leucine-rich repeat kinase (LRRK2) genetics and parkinson’s disease-associated LRRK2. Hum. Mol. Genet. 26, 2747–2767 (2017).
disease. Adv. Neurobiol. 14, 3–30 (2017). 38. Schmidt, S. H. et al. The dynamic switch mechanism that leads to activation of LRRK2
2. Di Maio, R. et al. LRRK2 activation in idiopathic Parkinson’s disease. Sci. Transl. Med. 10, is embedded in the DFGψ motif in the kinase domain. Proc. Natl Acad. Sci. USA 116,
eaar5429 (2018). 14979–14988 (2019).
3. Abeliovich, A. & Gitler, A. D. Defects in trafficking bridge Parkinson’s disease pathology 39. Wang, Y. et al. CRACR2a is a calcium-activated dynein adaptor protein that regulates
and genetics. Nature 539, 207–216 (2016). endocytic traffic. J. Cell Biol. 218, 1619–1633 (2019).
4. Gloeckner, C. J. et al. The Parkinson disease causing LRRK2 mutation I2020T is associated 40. Etoh, K. & Fukuda, M. Rab10 regulates tubular endosome formation through KIF13A and
with increased kinase activity. Hum. Mol. Genet. 15, 223–232 (2006). KIF13B motors. J. Cell Sci. 132, jcs226977 (2019).
5. Watanabe, R. et al. The in situ structure of Parkinson’s disease-linked LRRK2. Cell 182, 41. Horgan, C. P., Hanscom, S. R., Jolly, R. S., Futter, C. E. & McCaffrey, M. W. Rab11-FIP3 links
1508–1518.e16 (2020). the Rab11 GTPase and cytoplasmic dynein to mediate transport to the
6. Gotthardt, K., Weyand, M., Kortholt, A., Van Haastert, P. J. M. & Wittinghofer, A. Structure endosomal-recycling compartment. J. Cell Sci. 123, 181–191 (2010).
of the Roc-COR domain tandem of C. tepidum, a prokaryotic homologue of the human 42. Niwa, S., Tanaka, Y. & Hirokawa, N. KIF1Bβ- and KIF1A-mediated axonal transport of
LRRK2 Parkinson kinase. EMBO J. 27, 2239–2249 (2008). presynaptic regulator Rab3 occurs in a GTP-dependent manner through DENN/MADD.
7. Deyaert, E. et al. Structure and nucleotide-induced conformational dynamics of the Nat. Cell Biol. 10, 1269–1279 (2008).
Chlorobium tepidum Roco protein. Biochem. J. 476, 51–66 (2019). 43. Matanis, T. et al. Bicaudal-D regulates COPI-independent Golgi-ER transport by recruiting
8. Deng, J. et al. Structure of the ROC domain from the Parkinson’s disease-associated the dynein-dynactin motor complex. Nat. Cell Biol. 4, 986–992 (2002).
leucine-rich repeat kinase 2 reveals a dimeric GTPase. Proc. Natl Acad. Sci. USA 105, 44. Sui, H. & Downing, K. H. Structural basis of interprotofilament interaction and lateral
1499–1504 (2008). deformation of microtubules. Structure 18, 1022–1031 (2010).
9. Zhang, P. et al. Crystal structure of the WD40 domain dimer of LRRK2. Proc. Natl Acad.
Sci. USA 116, 1579–1584 (2019). Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
10. Guaitoli, G. et al. Structural model of the dimeric Parkinson’s protein LRRK2 reveals a published maps and institutional affiliations.
compact architecture involving distant interdomain contacts. Proc. Natl Acad. Sci. USA
113, E4357–E4366 (2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Article
Methods of 20 mM HEPES pH7.4, 80 mM NaCl, 0.5mM TCEP, 5% glycerol, 2.5 mM
MgCl2 and 20 μM GDP and then diluted to a final concentration of 4
Data reporting μM in the same buffer. We have only observed the trimer species when
No statistical methods were used to predetermine sample size, and preparing grids with LRRK2RCKW at concentrations of 4 μM or above and
experiments were not randomized. for our apo samples (that is, in the absence of inhibitors). This sample
was applied to glow-discharged (20 mA for 20 s in a K100 Instrument)
Cloning, plasmid construction and mutagenesis UltrAuFoil Holey Gold R 1.2/1.3 grids (Quantifoil). A Vitrobot (FEI) was
For baculovirus expression, the DNA coding for wild-type LRRK2 then used to blot away excess sample and plunge freeze the grids in
residues 1327 to 2527 (taken from Mammalian Gene Collection) was liquid ethane. Grids were stored in liquid nitrogen until imaged.
PCR-amplified using the forward primer TACTTCCAATCCATGA Cryo-EM data were collected at UCLA California NanoSystems
AAAAGGCTGTGCCTTATAACCGA and the reverse primer TATCCACCTTT Institute in a Titan Krios (FEI) operated at 300 kV, equipped with a K2
ACTGTCACTCAACAGATGTTCGTCTCATTTTTTCA. The T4 polymerase- Summit direct electron detector (Gatan) and a Quantum energy filter
treated amplicon was inserted into the expression vector pFB-6HZB (Gatan). Automated data collection was performed using Leginon47.
by ligation-independent cloning. According to Bac-to-Bac expression We recorded a total of 3,824 movies in ‘counting mode’ at a dose rate of
system protocols (Invitrogen), this plasmid was used for the generation 6.65 electrons Å−2 s−1 with a total exposure time of 8 s sub-divided into
of recombinant Baculoviruses. 200-ms frames, for a total of 40 frames. The images were recorded at
For mammalian expression vectors, pDEST53-GFP-LRRK2 (WT)45 a nominal magnification of 130,000×, resulting in an object pixel size
from Addgene (25044) was used. pDEST53-GFP-LRRK2 (I2020T) was of 1.07 Å. The defocus range of the data was −1 μm to −1.8 μm.
cloned using QuikChange site-directed mutagenesis (Agilent) with the
forward primer AAGATTGCTGACTACGGCACTGCTCAGTACTGCTG and Electron microscopy map and model generation of trimer data-
the reverse primer CAGCAGTACTGAGCAGTGCCGTAGTCAGCAATCTT. set. We aligned the movie frames using UCSF MotionCor248, using
pET17b-Kif5b(1-560)-GFP-His46 was obtained from Addgene (15219). the dose-weighted frame alignment option. We estimated the CTF
For pET28a-ZZ-TEV-Halo-NINL1–702, Ninein-like1–702 (NINL) was synthe- on dose-weighted images using GCTF version 1.0649 as implemented
sized as previously described25 and inserted into a pET28a expression in Appion50 with per-particle CTF generation. Images having CTF fits
vector with a synthesized ZZ-TEV-Halo gBlock fragment (IDT) using worse that 5 Å (as determined by GCTF) were excluded from further pro-
Gibson assembly. cessing. Using this approach, 3,693 micrographs were kept for further
processing. We selected particles from micrographs using FindEM51
LRRK2RCKW expression and purification with projections of a trimeric LRRK2RCKW map, created from an initial
The expression construct contained an N-terminal His6-Z-tag, cleav- Cryosparc ab initio model generation, serving as a reference. Particle
able with TEV protease (Extended Data Fig. 1a–c). For LRRK2RCKW puri- picking was performed within the framework of Appion, resulting in
fication, the pelleted Sf9 cells were washed with PBS, resuspended in a dataset of 836,956 particles.
lysis buffer (50 mM HEPES pH 7.4, 500 mM NaCl, 20 mM imidazole, We carried out subsequent processing first in Relion 3.052 then in
0.5 mM TCEP, 5% glycerol, 5 mM MgCl2, 20 μM GDP) and lysed by soni- CryoSparc253. A series of 2D and 3D classifications were performed
cation. The lysate was cleared by centrifugation and loaded onto a as shown in Extended Data Fig. 1f to generate the final map. The initial
Ni-NTA (Qiagen) column. After vigorous rinsing with lysis buffer the reference was created in a similar manner to those used for template
His6-Z-tagged protein was eluted in lysis buffer containing 300 mM picking, from an initial ab initio model generated in CryoSparc. All ref-
imidazole. Immediately thereafter, the eluate was diluted with a buffer erences were filtered to either 60 Å (default in Relion) or 30 Å (default
containing no NaCl, to reduce the NaCl-concentration to 250 mM and in CryoSparc) before refinement processes. The final map, generated
loaded onto an SP sepharose column. His6-Z-TEV-LRRK2RCKW was eluted in Cryosparc2 using non-uniform refinement and while applying C3
with a 250 mM to 2.5 M NaCl gradient and treated with TEV protease symmetry, reached 3.47 Å resolution. Initial 2D classifications used
overnight to cleave the His6-Z-tag. Contaminating proteins, the cleaved binned data (4.28 Å pixel−1) while all subsequent 3D classifications and
tag, uncleaved protein and TEV protease were removed by another com- refinement steps used unbinned images (1.07 Å pixel−1).
bined SP sepharose Ni-NTA step. Finally, LRRK2RCKW was concentrated To improve the density corresponding to the ROC and COR-A
and subjected to gel filtration in storage buffer (20 mM HEPES pH 7.4, domains, a second map was generated following a different process-
800 mM NaCl, 0.5 mM TCEP, 5% glycerol, 2.5 mM MgCl2, 20 μM GDP) ing scheme, which used signal subtraction and is shown in Extended
using an AKTA Xpress system combined with an S200 gel filtration Data Fig. 2a. The final refinement led to a 3.8 Å map of LRRK2RCKW. All
column. The final yield as calculated from UV absorbance was 1.2 mg l−1 the steps used unbinned images (1.07 Å pixel−1).
LRRK2RCKW insect cell medium. The resolutions of the cryo-EM maps, here and below, were esti-
mated from Fourier shell correlation (FSC) curves calculated using the
SEC–MALS gold-standard procedure and the resolutions are reported according to
SEC–MALS experiments were performed using an ÄKTAmicro chroma- the 0.143 cut-off criterion54–56. FSC curves were corrected for the convo-
tography system hooked up to a Superdex 200 Increase 3.2/300 size lution effects of a soft mask applied to the half maps by high-resolution
exclusion chromatography column coupled in-line to a DAWN HELEOS phase randomization57. For display and analysis purposes, we sharp-
II multiangle light scattering detector (Wyatt Technology) and an Opti- ened the maps with automatically estimated negative B factors from
lab T-rEX refractive index detector (Wyatt Technology). SEC–MALS Relion or CryoSparc2.
was performed in 50 mM HEPES pH 7.4, 200 mM NaCl 0.5 mM TCEP, We built the LRRK2RCKW models using both the 3.47 Å C3 and 3.8 Å
5% glycerol, 5 mM MgCl2, and 20 μM GDP. For a typical sample, 50 μl of density-subtracted maps. We used a combination of Rosetta58 and
approximately 7 μM LRRK2RCKW was injected onto the column. Molar manual building in Coot59 to build all models. Starting models were
mass was calculated using ASTRA 6 software, with protein concentration found via a sequence alignment search in HHpred60 and each sec-
derived from the Optilab T-rEX. LRRK2RCKW used for SEC-MALS experi- tion was built from the following: (1) the ROC and COR domains were
ments contained an extra 16 residue N-terminal Gly-Ser linker sequence. built using the top two alignment results, 3DPT_B and 3DPU_B, which
scored much higher than those ranked third and below; (2) the kinase
Electron microscopy N-lobe, kinase C-lobe, and WD40 domain were all built from the top
Electron microscopy sample preparation and imaging of trimer five sequence matches: 5JGA_A, 1OPK_A, 4BTF_A, 5GZA_A, 5K00_A for
dataset. Purified LRRK2RCKW was dialysed into a final buffer consisting the kinase N-lobe; 5CYZ_A, 2IZR_A, 5YKS_B, 3S95_B, 4TWC_B for the
kinase C-lobe; and 4J87_A, 3OW8_C, 5U69_A, 5O9Z_F, 5T2A_7 for the extracted using crYOLO45. We carried out subsequent processing in
WD40 domain. The C-terminal helix was built manually. We built the CryoSparc253 on binned images (2.32 Å pixel−1).
COR-B, KIN and WD40 domains using the density of the 3.47 Å C3 map A series of 2D and 3D classifications were performed as shown in
and the ROC and COR-A domains using the 3.8 Å map, then connected Extended Data Fig. 6 to generate the final map. The initial monomer
the two after fitting them into the 3.8 Å map. Finally, we performed reference, used for refinement of the monomer map, was generated
multiple iterations of both the CM and Relax functions in Rosetta, along from our LRRK2RCKW model. The initial dimer references, used for par-
with manual manipulation in Coot, to build our final 20 models ticle sorting, were generated as shown in Extended Data Fig. 5a–e. All
(10 including GDP-Mg2+ in the ROC domain, and 10 excluding it). In references were filtered to 30 Å, the default value in CrypSparc2, before
areas of weak density, we either removed part of the polypeptide chain refinement processes. This final map reached a resolution of 8.08 Å
or, where only side-chain density was poor, converted the chain to using non-uniform refinement.
poly-alanine. The apo LRRK2RCKW dimer cryo-EM data were collected as described
The GDP-Mg2+ was placed in our model by initially aligning a structure for the apo LRRK2RCKW monomer. A total of 5,303 movies were collected.
of the ROC domain containing a bound GDP-Mg2+ (PDB code 2ZEJ)8 to We imaged samples at dose rates between 4.6 and 7.8 electrons Å−2 s−1
the ROC domain in our structure. The GDP-Mg2+ was then added to our with total exposure times ranging between 7 s and 11 s sub-divided
model in the aligned position and run through Rosetta to allow for fine into 200ms frames, for a total of 35 or 55 frames. All the images were
movements into our density and re-arrangement of nearby chains. recorded at a nominal magnification of 36,000× (either counting mode
The phosphorylation of the threonine residue 1343, which we or super resolution mode), resulting in object pixel sizes of either
observe in our map and is a known phosphorylation site of Roco family 1.16 Å or 0.58 Å, respectively. The defocus range of the data was
GTPases61, was confirmed by phosphor-enrichment mass spectrometry −1 μm to −2 μm.
(data not shown). Frame alignment, CTF estimation, image selection, and particle pick-
ing were performed as described for the apo LRRK2RCKW monomer.
Electron microscopy sample preparation, imaging, and process- After selection we had 3,100 micrographs. We carried out subsequent
ing of apo, MLi2, ponatinib LRRK2RCKW monomer or dimers. For all processing in CryoSparc253 on binned images (2.32 Å pixel−1).
samples, purified LRRK2RCKW was dialysed into the same final buffer as The classification and refinement scheme for the apo LRRK2RCKW
described for the trimer data, and then diluted to its final concentration WD40- and COR-mediated dimer maps is shown in Extended Data Fig. 6.
in the same buffer. Unlike with the trimer data, however, the following The same initial dimer references used for the apo LRRK2RCKW mono-
datasets were collected from multiple grids prepared using slightly mer, filtered to the same resolution, were used here for refinement of
different sample conditions and imaged using a range of microscope the dimer maps. In addition, a linear trimer reference, generated by
settings. combining two initial dimer references and filtered to the same
The apo LRRK2RCKW monomer dataset had sample diluted to final resolution, was used during the initial 3D classification step to sort
concentrations ranging between 1 μM and 6 μM. In addition, one data- out species longer than dimer, which negatively affected subsequent
set was collected with the grid tilted to 30° to overcome preferred alignment. The final maps had resolutions of 13.39 Å (WD40-mediated
orientation issues. Otherwise, grids were prepared as described for dimer) and 9.52 Å (COR-mediated dimer) with C2 symmetry applied
the trimer dataset. to both.
The apo LRRK2RCKW dimer dataset had sample diluted to final concen- The MLi-2 LRRK2RCKW dimer cryo-EM data were collected as described
trations ranging between 4 μM and 12 μM. Two of the samples contained for the apo LRRK2RCKW monomer. We recorded a total of 4,139 movies.
0.05 mM digitonin (Sigma, D141) or 0.03% octyl glucoside (Sigma, We imaged all datasets in ‘counting mode’ at a dose rate of 5.5 elec-
O8001) detergents to overcome preferred orientation issues. One data trons Å−2 s−1, with total exposure times of either 9 s or 10 s sub-divided
set was collected with the grid tilted to 30°, also to overcome preferred into 200ms frames, for a total of 45 or 50 frames. All the images were
orientation issues. Otherwise, grids were prepared as described for recorded at a nominal magnification of 36,000× (counting mode),
the trimer dataset. resulting in object pixel sizes of 1.16 Å. The defocus range of the data
The MLi-2 LRRK2RCKW dimer dataset had sample diluted to final con- was −1 μm to −2 μm.
centrations of either 3 μM or 4 μM. MLi-2 was added post-dialysis to a Frame alignment, CTF estimation, image selection, and particle pick-
final concentration of 5 μM. The sample was incubated on ice for at least ing were performed as described for the apo LRRK2RCKW monomer.
1 h before being applied to the grid. Otherwise, grids were prepared as After selection, 4,030 micrographs were kept for further processing.
described for the trimer dataset. Processing was done in Relion 3.052 then CryoSparc253 using binned
The ponatinib LRRK2RCKW dimer dataset had sample diluted to final images (2.32 Å pixel−1).
concentrations of 2 μM or 4 μM. Ponatinib was added post-dialysis at The classification and refinement scheme for the MLi-2 LRRK2RCKW
a concentration of either 5 μM or 100 μM. The sample was incubated WD40- and COR-mediated dimer maps is shown in Extended Data Fig. 5.
on ice for at least 1 h before being applied to the grid. Otherwise, grids The same references used for the apo LRRK2RCKW dimers, and filtered to
were prepared as described for the trimer dataset. the same resolution, were used here for the same purposes. The final
The apo LRRK2RCKW monomer cryo-EM data were collected on a Talos maps had resolutions of 9.74 Å (WD40-mediated dimer) and 9.04 Å
Arctica (FEI) operated at 200 kV, equipped with a K2 Summit direct (COR-mediated dimer), with no symmetry applied.
electron detector (Gatan). Automated data collection was performed The ponatinib LRRK2RCKW dimer cryo-EM data were collected as
using Leginon47. A total of 11,354 movies were collected. We imaged described for the apo LRRK2RCKW monomer. We recorded a total of
samples at dose rates between 4.2 and 10 electrons Å−2 s−1 with total 1,797 movies. We imaged all datasets at dose rates of either 5.5 or 9.7
exposure times ranging from 6 s to 12 s sub-divided into 200-ms frames, electrons Å−2 s−1 with total exposure times of 7 s or 10 s sub-divided
for a total of 30 or 60 frames. All the images were recorded at a nominal into 200-ms frames, for a total of 35 or 50 frames. All the images were
magnification of 36,000× (either counting mode or super resolution recorded at a nominal magnification of 36,000× (counting mode),
mode), resulting in object pixel sizes of either 1.16 Å or 0.58 Å, respec- resulting in object pixel sizes of 1.16 Å. The defocus range of the data
tively. The defocus range of the data was −1 μm to −2 μm. was −1 μm to −2 μm.
Frame alignment, CTF estimation and image selection were per- Frame alignment, CTF estimation, image selection, and particle pick-
formed as described for the trimer dataset except that per-particle ing were performed as described for the apo LRRK2RCKW monomer. A
CTF was not used, instead the CTF information of the whole image total of 1,455 micrographs were kept for further processing. Processing
was used. After selection, we had 7,067 micrographs. Particles were was done in CryoSparc253 on binned images (2.32 Å pixel−1). Ponatinib
Article
LRRK2RCKW dimer particles were sorted via 2D classification leading to (C-lobe) and WD40 domains. The C-lobe of 3QGY was then aligned
the final 2D averages shown in Extended Data Fig. 6b. (in Chimera) to the C-lobe of the LRRK2RCKW kinase domain, and subse-
quently the N-lobe of the LRRK2RCKW kinase was aligned to the N-lobe
Electron microscopy sample preparation and imaging of of 3QGY. The two halves were then combined to generate the closed
microtubule-associated LRRK2RCKW. Purified LRRK2RCKW and unpo- kinase model of LRRK2RCKW.
lymerized bovine tubulin were added together to the microtubule
polymerization buffer (1× BRB80, 1 mM DTT, 10% DMSO, 1 mM GTP, 1 Docking of LRRK2RCKW into cryo-EM maps of monomers and
mM MgCl2, 10 μM Taxol). LRRK2RCKW was added in a 2× molar excess rela- dimers
tive to the α/β tubulin dimer during the polymerization procedure. To To build models of WD40- and COR-mediated dimers of LRRK2RCKW in
promote the interaction of LRRK2RCKW with microtubules, NaCl concen- the presence of MLi-2, we again split LRRK2RCKW at the junction between
tration was capped at 90 mM; this set an upper limit to the LRRK2RCKW the N- and C-lobes (L1949–A1950). The two halves were fitted into
concentrations we could use. For wild-type LRRK2RCKW, we used 2.5 μM one half of the cryo-EM map of a WD40-mediated dimer of LRRK2RCKW
LRRK2RCKW and 1.25 μM tubulin. For the I2020T variant, we used 4.5 μM obtained in the presence of MLi-2 (we chose this map as its resolution
LRRK2RCKW(I2020T) and 2.25 μM tubulin, respectively. The mixture was was higher than that of the COR-mediated dimer). We also docked the
incubated for 1 h at room temperature and was then diluted threefold two halves of LRRK2RCKW into a cryo-EM map of a LRRK2RCKW monomer
into cryo-EM buffer (20 mM HEPES pH 7.4, 80 mM NaCl, 0.5 mM TCEP, obtained in the absence of inhibitor. The fitting was done in Chimera62
2.5 mM MgCl2, 10 μM taxol) immediately before application to the using the Fit in Map function with the options of filtering the structure
grid and plunge-freezing in a Vitrobot (FEI). The grids (Lacey Carbon to the resolution of the map and optimizing correlation. The two halves
on copper, 300 mesh; EMS) were glow-discharged (20 mA for 40 s in a were then joined to generate a full model of LRRK2RCKW.
K100 Instrument) before the sample was applied. The WD40- and COR-mediated dimers of LRRK2RCKW in the presence
Cryo-EM data were collected on a Talos Arctica (FEI) operated at 200 of MLi-2 were built by docking the models built above into the cor-
kV, equipped with a K2 Summit direct electron detector (Gatan) using responding cryo-EM maps, using the same approach in Chimera62 as
Leginon47. We imaged at a dose rate of 5.85 electrons Å−2 s−1 with a total outlined above.
exposure time of 10 s sub-divided into 200 ms frames, for a total of The LRRK2RCKW filament shown in Extended Data Fig. 7f was generated
50 frames. All the images were recorded at a nominal magnification by aligning, in alternating order, several copies of the two dimer models
of 36,000×, resulting in an object pixel size of 1.16 Å. The defocus range (WD40- and COR-mediated) built into the cryo-EM maps obtained in
of the data was −1 μm to −2 μm. Images were aligned using MotionCor248 the presence of MLi-2.
with the dose-weighted frame alignment option.
For layer line analysis, filaments were selected and extracted into Kinase inhibitors
600 Å boxes using Relion 3.052 from the dose-corrected images. Stocks of the kinase inhibitors MLi-2 (10 mM; Tocris), ponatinib (10 mM;
The images were binned fourfold into a pixel size of 4.64 Å and a final ApexBio), GZD-824 (10 mM; Cayman Chemical) and LRRK2-IN-1 (2 mM;
box size of 164 pixels. Before calculating the Fourier transforms, the Michael J. Fox Foundation) were stored in DMSO at −20 °C.
images were padded with an additional 82 pixels on all sides.
Antibodies
Building the molecular model of microtubule-associated All antibodies used for immunocytochemistry were diluted to 1:500.
LRRK2RCKW filaments Primary antibodies used were chicken anti-GFP (Aves Labs) and rab-
Given that the WD40 densities are clearly identifiable in the cryo-ET bit anti-α-tubulin (ProteichTech). Secondary antibodies used were
map of microtubule-associated LRRK2, we used these as a starting point goat anti-chicken-Alexa 488 (ThermoFisher) and goat anti-chicken
for docking the structure of LRRK2RCKW into the cryo-ET map. First, a Alexa568 (ThermoFisher). DAPI was used at 1:5,000 according to
synthetic dimer of the WD40 domains from the LRRK2RCKW structure was the manufacturer’s suggestions (ThermoFisher). Primary antibod-
generated by aligning them to a crystal structure of the isolated WD40 ies used for western blots were mouse anti-GFP (Santa Cruz, 1:1,000
domain, which formed a dimer in the crystal (PDB code 6DLP)9. This dilution) mouse anti-GAPDH (ProteinTech, 1:5,000 dilution) and mouse
synthetic dimer was then docked into the cryo-ET map in Chimera62, anti-gamma-tubulin (ProteinTech, 1:5,000 dilution). Secondary anti-
using the Fit in Map function with the options of filtering the structure bodies (1:15,000) used for western blots were IRDye goat anti-mouse
to the resolution of the map (14 Å) and optimizing correlation. A WD40 680RD and IRDye goat anti-rabbit 780RD (Li-COR).
dimer was placed into each of the two corresponding densities present
in the map. Then, four copies of the LRRK2RCKW structure were added Rab8a expression and purification
by aligning their WD40 domains to those previously docked into the N-terminally tagged (His6-ZZ) Rab8A containing a TEV cleavage site was
cryo-ET map. The same procedure was followed to build the filament cloned into a PET28a expression vector and expressed in BL21(DE3)
using the ‘closed’ kinase model of LRRK2RCKW (see ‘Modelling a closed Escherichia coli cells. Transformed cells were grown overnight at 37 °C
kinase version of LRRK2RCKW’ for how that model was generated). in 10 ml LB medium containing kanamycin (50 μg ml−1), then diluted into
Backbone clashes at the COR-mediated interface in the model fila- 200 ml LB medium containing kanamycin (50 μg ml−1), grown to an opti-
ment were measured in Chimera62 (with default settings) after convert- cal density at 600 nm of approximately 1–2, diluted into 4 l LB medium
ing the four LRRK2RCKW monomers to poly-alanine models. containing kanamycin (50 μg ml−1), and grown to an optical density at
600 nm of 0.4. IPTG was added (final concentration 0.5 mM) to induce
Modelling a closed kinase version of LRRK2RCKW protein expression for around 18 h at 18 °C. Cells were collected by
To identify a good reference to model the closed state of the LRRK2RCKW centrifugation at 8,983g for 10 min at 4 °C, followed by resuspension
kinase, we ran separate structural searches (using the DALI server) with in 15 ml LB medium and centrifugation at 2,862g for 10 min at 4 °C. The
the N- and C-lobes of the LRRK2RCKW kinase domain. We looked through cell pellet was flash frozen in liquid nitrogen and stored at −80 °C. For a
the matches for a kinase that scored highly with both lobes, and whose typical protein purification, cell pellets were resuspended in lysis buffer
structure is in a closed state. We selected interleukin-2 inducible T-cell (50 mM HEPES pH 7.4, 200 mM NaCl, 2 mM DTT, 10% glycerol, 5 mM
kinase (ITK) bound to an inhibitor as our reference (PDB code 3QGY)63. MgCl2, 0.5 mM Pefabloc, and protease inhibitor cocktail tablets) and
LRRK2RCKW was split at the junction between the N- and C-lobes of its lysed by sonication on ice. The lysate was clarified by centrifugation
kinase domain (L1949–A1950), resulting in one half containing the ROC, at 164,700g for 40 min at 4 °C and then incubated with Ni-NTA agarose
COR, and kinase (N-lobe) domains and another containing the kinase beads (Qiagen) for 1 h at 4 °C. Beads were extensively washed with wash
buffer (50 mM HEPES pH 7.4, 150 mM NaCl, 2 mM DTT, 10% glycerol, 278,835g for 12 min at room temperature. Final pellet (kinesin-bound
5 mM MgCl2); His6-ZZ-Rab8a was eluted in 40 ml elution buffer (50 microtubules) was washed with BRB80 (80 mM PIPES, 2 mM MgCl2,
mM HEPES pH 7.4, 150 mM NaCl, 300 mM imidazole, 2 mM DTT, 10% and 1 mM EGTA, pH 7.0) and incubated in 100 μl of release buffer (80
glycerol, 5 mM MgCl2). The protein eluate was diluted twofold in wash mM PIPES, 2 mM MgCl2, 1 mM EGTA, and 300 mM KCl, pH 7 with 5 mM
buffer, incubated with IgG sepharose 6 fast flow beads equilibrated Mg-ATP) for 5 min at room temperature. The supernatant was supplied
in wash buffer, incubated at 4 °C for 2.5 h, and washed extensively in with 660 mM sucrose and flash frozen. A typical kinesin prep yielded
wash buffer. Protein-bound IgG beads were then transferred into TEV 0.5 to 1 μM K560–GFP dimer.
buffer (50 mM HEPES pH 7.4, 200 mM NaCl, 2 mM DTT, 10% glycerol, Human dynactin was purified from stable cell lines expressing
5 mM MgCl2), and untagged Rab8A was cleaved off of IgG sepharose p62-Halo-3×Flag as previously described65,66. In brief, cells were col-
beads by incubation with TEV protease at 4 °C overnight. The next day, lected from 160 × 15 cm plates and resuspended in 80 ml of dynactin-lysis
cleaved Rab8A was separated from His6-TEV protease and any remaining buffer (30 mM HEPES, pH 7.4, 50 mM potassium acetate, 2 mM magne-
uncleaved protein or residual tag by incubation with Ni-NTA agarose sium acetate, 1 mM EGTA, 1 mM DTT, 10% (v/v) glycerol) supplemented
beads (Qiagen), followed by washing with TEV buffer containing 25 with 0.5 mM Mg-ATP, 0.2% Triton X-100 and 1 cOmplete EDTA-free pro-
mM imidazole. Lastly, purified Rab8A was run over a Superdex 200 tease inhibitor cocktail tablet (Roche) per 50 ml and rotated slowly for
increase 10/300 size exclusion column equilibrated in S200 buffer (50 15 min. The lysate was clarified by centrifuging at 66,000g for 30 min in
mM HEPES pH 7.4, 200 mM NaCl, 2 mM DTT, 1% glycerol, 5 mM MgCl2), Type 70 Ti rotor (Beckman). The clarified supernatant was incubated
and concentrated and exchanged into buffer containing 10% glycerol with 1.5 ml of anti-Flag M2 affinity gel (Sigma-Aldrich) overnight on a
for storage at −80 °C. roller. The beads were transferred to a gravity flow column, washed
with 50 ml of wash buffer (dynactin-lysis buffer supplemented with
In vitro phosphorylation of Rab8A by LRRK2RCKW 0.1 mM Mg-ATP, 0.5 mM Pefabloc and 0.02% Triton X-100), 100 ml of
Purified Rab8A (approximately 3.8 μM) was phosphorylated by wash buffer supplemented with 250 mM potassium acetate, and again
LRRK2RCKW (38 nM) in a buffer containing 50 mM HEPES pH 7.4, 80 with 100 ml of wash buffer. Dynactin was eluted from beads with 1 ml
mM NaCl, 10 mM MgCl2, 0.5 mM TCEP, 1 mM ATP, 200 μM GDP); 34 μl of elution buffer (wash buffer with 2 mg ml−1 of 3×Flag peptide). The
reaction mixtures containing kinase inhibitor or an equivalent volume eluate was collected, filtered by centrifuging with Ultrafree-MC VV
DMSO were incubated at 30 °C, and samples were taken at 45 min, and filter (EMD Millipore) in a tabletop centrifuge and diluted to 2 ml in
90 min. An effective reaction volume of 0.75 μl was run on a 4–12% buffer A (50 mM Tris-HCl, pH 8.0, 2 mM magnesium acetate, 1 mM
Bis-Tris protein gel, transferred to nitrocellulose, and blotted with EGTA and 1 mM DTT) and injected onto a MonoQ 5/50 GL column (GE
a commercially available antibody to pT72-Rab8a (MJFF-pRab8) as Healthcare and Life Sciences) at 1 ml min−1. The column was pre-washed
previously described49 and per manufacturer’s instructions, with the with 10 column volumes (CV) of buffer A, 10 CV of buffer B (50 mM
exception that HRP-labelled secondary antibody was used at a dilu- Tris-HCl, pH 8.0, 2 mM magnesium acetate 1 mM EGTA, 1 mM DTT and
tion of 1:2,000. 1 M potassium acetate) and again with 10 CV of buffer A at 1 ml min −1.
To elute, a linear gradient was run over 26 CV from 35–100% buffer B.
Purification of molecular motors Pure dynactin complex eluted from 75–80% buffer B. Peak fractions
Protein purification steps were done at 4 °C unless otherwise indicated. containing pure dynactin complex were pooled, buffer exchanged
Human KIF5B1–560(K560)–GFP was purified from E. coli using an adapted into a GF150 buffer supplemented with 10% glycerol, concentrated to
protocol previously described64. pET17b-Kif5b(1-560)-GFP-His was 0.02–0.1 mg ml−1 using a 100 K MWCO concentrator (EMD Millipore)
transformed into BL-21[DE3] RIPL cells (New England Biolabs) until and flash frozen in liquid nitrogen. Typical dynactin prep yields are
optical density at 600 nm of 0.6–0.8 and expression was induced with between 150 and 300 nM.
0.5 mM IPTG for 16 h at 18 °C. Frozen pellets from 2 l culture were resus- Human dynein was purified from stable cells lines expressing an
pended in 40 ml lysis buffer (50 mM Tris, 300 mM NaCl, 5 mM MgCl2, IC2-SNAPf-3×Flag as previously described25. Frozen pellets collected
and 0.2 M sucrose, pH 7.5) supplemented with 1 cOmplete EDTA-free from approximate 60–100 15-cm plates were resuspended in dynein
protease inhibitor cocktail tablet (Roche) per 50 ml and 1 mg ml−1 lysis buffer (25 mM HEPES pH 7.4, 50 mM potassium acetate, 2 mM
lysozyme. The resuspension was incubated on ice for 30 min and lysed magnesium acetate,1 mM EGTA, 10% glycerol (v/v) and 1 mM DTT) sup-
by sonication. Sonicate was supplied with 10 mM imidazole and 0.5 mM plemented with 0.2% Triton X-100, 0.5 mM Mg-ATP, and cOmplete
PMSF and clarified by centrifuging at 30,000g for 30 min in Type 70 EDTA-free protease inhibitor cocktail. The lysate was centrifuged at
Ti rotor (Beckman). The clarified supernatant was incubated with 5 ml 66,000g in a Ti-70 rotor for 30 min. The clarified supernatant was
Ni-NTA agarose (Qiagen) and rotated in a nutator for 1 h. The mixture incubated with 1 ml of anti-Flag M2 affinity gel (Sigma-Aldrich) over-
was washed with 30 ml wash buffer (50 mM Tris, 300 mM NaCl, 5 mM night on a roller. Beads were collected by gravity flow and washed with
MgCl2, 0.2 M sucrose, and 20 mM imidazole, pH 7.5.) by gravity flow. 50 ml wash buffer (dynein lysis buffer with 0.02% Triton X-100 and
Beads were resuspended in elution buffer (50 mM Tris, 300 mM NaCl, 0.5 mM Mg-ATP) supplemented with protease inhibitors (cOmplete
5 mM MgCl2, 0.2 M sucrose, and 250 mM imidazole, pH 8.0), incubated Protease Inhibitor Cocktail, Roche). Beads were then washed with 50
for 5 min, and eluted stepwise in 0.5 ml increments. Peak fractions ml high-salt wash buffer (25 mM HEPES, pH 7.4, 300 mM potassium
were combined and buffer exchanged on a PD-10 desalting column acetate, 2 mM magnesium acetate, 10% glycerol, 1 mM DTT, 0.02%
(GE Healthcare) equilibrated with storage buffer (80 mM PIPES, 2 mM Triton X-100, 0.5 mM Mg-ATP), and then with 100 ml wash buffer. For
MgCl2, 1 mM EGTA, and 0.2 M sucrose, pH 7.0). From this, peak fractions labelling, beads were resuspended in 1 ml wash buffer and incubated
of motor solution were either flash frozen at −80 °C until further use or with 5 μM SNAP-Cell TMR Star (New England BioLabs) for 10 min on
immediately subjected to microtubule bind and release purification. the column at room temperature. Unbound dye was removed with 100
A total of 1 ml motor solution was incubated with 1 mM AMP-PNP and ml wash buffer at 4 °C. Dynein was eluted with 1 ml of elution buffer
20 μM taxol on ice for 5 min and warmed to room temperature. For (wash buffer containing 2 mg ml−1 3×Flag peptide). The eluate was col-
microtubule bind and release, polymerized bovine brain tubulin was lected, diluted to 2 ml in buffer A (50 mM Tris pH 8.0, 2 mM magnesium
centrifuged through a glycerol cushion (80 mM PIPES, 2 mM MgCl2, 1 acetate, 1 mM EGTA and 1 mM DTT) and injected onto a MonoQ 5/50 GL
mM EGTA, and 60% glycerol (v/v) with 20 μM taxol and 1 mM DTT) and column (GE Healthcare Life Sciences) at 0.5 ml min−1. The column was
resuspended as previously described25 was incubated with motor solu- washed with 20 CV of buffer A at 1 ml min−1. To elute, a linear gradient
tion in the dark for 15 min at room temperature. The motor-microtubule was run over 40 CV into buffer B (50 mM Tris pH 8.0, 2 mM magnesium
mixture was laid on top of a glycerol centrifuged in a TLA120.2 rotor at acetate, 1 mM EGTA, 1 mM DTT, 1 M potassium acetate). Pure dynein
Article
complex elutes from ~60–70% buffer B. Peak fractions were pooled microtubules to the coverslip. Flow chambers containing adhered
and concentrated, 0.1 mM Mg-ATP and 10% glycerol were added and microtubules were washed twice with LRRK2 buffer (20 mM HEPES
the samples were snap frozen in liquid nitrogen. A typical preparation pH 7.4, 80 mM NaCl, 0.5 mM TCEP, 5% glycerol, 2.5 mM MgCl2 and
yielded 150–300 nM dynein. 20 μM GDP). Flow chambers were then incubated for 5 min either
Human NINL was purified as previously described 65. with (1) LRRK2 buffer alone or with the indicated kinase inhibitors
pET28a-ZZ-TEV-Halo-NINL1–702 was transformed into BL-21[DE3] cells (0 nM LRRK2RCKW condition) or (2) LRRK2 buffer containing LRRK2RCKW
(New England Biolabs) until an optical density at 600 nm of 0.4–0.6 alone, with DMSO, or with kinase inhibitors. DMSO or drugs were
and expression was induced with 0.1 mM IPTG for 16 h at 18 °C. Frozen incubated with LRRK2 buffer (± LRRK2RCKW) for 10 min at room tem-
cell pellets from 1 l culture were resuspended in 40 ml of activator-lysis perature before adding to the flow chambers. Before the addition
buffer (30 mM HEPES [pH 7.4], 50 mM potassium acetate, 2 mM mag- of dynein and kinesin motors, the flow chambers were washed twice
nesium acetate, 1 mM EGTA, 1 mM DTT, 0.5 mM Pefabloc, 10% (v/v) with motility assay buffer containing 1 mg ml−1 casein. To assemble
glycerol) supplemented with 1 cOmplete EDTA-free protease inhibitor dynein-dynactin-ninein-like (NINL) complexes, purified dynein (10–15
cocktail tablet (Roche) per 50 ml and 1 mg ml−1 lysozyme. The resus- nM), dynactin and NINL were mixed at 1:2:10 molar ratio and incubated
pension was incubated on ice for 30 min and lysed by sonication. The on ice for 10 min. The final imaging buffer for motors contained motil-
lysate was clarified by centrifuging at 66,000g for 30 min in Type 70 Ti ity assay buffer supplemented with an oxygen scavenger system,
rotor (Beckman). The clarified supernatant was incubated with 2 ml of 71.5 mM β-mercaptoethanol and either 1 mM ATP (kinesin) or 2.5 mM
packed IgG Sepharose 6 Fast Flow beads (GE Healthcare Life Sciences) ATP (dynein). The final concentrations of kinesin and dynein in the flow
for 2 h on a roller. The beads were transferred to a gravity flow column, chambers were 2.5 nM and 0.3 nM, respectively. K560–GFP was imaged
washed with 100 ml of activator-lysis buffer supplemented with 150 mM every 500 ms for 2 min with 25% laser (488) power at 150 ms exposure
potassium acetate and 50 ml of cleavage buffer (50 mM Tris–HCl, pH time. Dynein-TMR-dynactin-NINL was imaged every 300 ms for 3 min
8.0, 150 mM potassium acetate, 2 mM magnesium acetate, 1 mM EGTA, 1 with 25% laser (561) power at 100 ms. Each sample was imaged no longer
mM DTT, 0.5 mM Pefabloc and 10% (v/v) glycerol). The beads were then than 15 min. Each technical replicate consisted of movies from at least
resuspended and incubated in 15 ml of cleavage buffer supplemented two fields of view containing between 5 and 10 microtubules each.
with 0.2 mg ml−1 TEV protease overnight on a roller. The supernatant
containing cleaved proteins were concentrated using a 50-K MWCO Single-molecule motility assay analysis
concentrator (EMD Millipore) to 1 ml, filtered by centrifuging with Kymographs were generated from motility movies and quantified for
Ultrafree-MC VV filter (EMD Millipore) in a tabletop centrifuge, diluted run lengths, percent motility, and velocity using ImageJ (NIH). Specifi-
to 2 ml in buffer A (30 mM HEPES, pH 7.4, 50 mM potassium acetate, 2 cally, maximum-intensity projections were generated from time-lapse
mM magnesium acetate, 1 mM EGTA, 10% (v/v) glycerol and 1 mM DTT) sequences to define the trajectory of particles on a single microtubule.
and injected onto a MonoQ 5/50 GL column (GE Healthcare and Life Sci- The segmented line tool was used to trace the trajectories and map them
ences) at 0.5 ml min−1. The column was pre-washed with 10 CV of buffer onto the original video sequence, which was subsequently re-sliced to
A, 10 CV of buffer B (30 mM HEPES, pH 7.4, 1 M potassium acetate, 2 mM generate a kymograph. Motile and immotile events (>1 s) were manually
magnesium acetate, 1 mM EGTA, 10% (v/v) glycerol and 1 mM DTT) and traced. Bright aggregates, which were less than 5% of the population,
again with 10 CV of buffer A at 1 ml min−1. To elute, a linear gradient was were excluded from the analysis. For dynein–dynactin–NINL, both
run over 26 CV from 0–100% buffer B. The peak fractions containing stationary and diffusive events were grouped as immotile. Run-length
unlabelled Halo-tagged NINL were collected and concentrated to using measurements were calculated from motile events only. Error bars for
a 50K MWCO concentrator (EMD Millipore) to 0.2 ml. A typical NINL run length analysis were generated using a bootstrapping method (run
prep yield was approximately 5–10 μM dimer. length values from each condition were resampled 200 times using an
XLSTAT program for Excel) and statistical significance was established
Single-molecule microscopy and motility assays using a one-way ANOVA with Dunnett’s test for multiple comparisons or
Single-molecule imaging was performed using total internal reflection an unpaired Welch’s t-test for only one comparison. For percent motility
fluorescence (TIRF) microscopy with an inverted microscope (Nikon, per microtubule measurements, motile events (>1 s and >1 μm) were
Ti-E Eclipse) equipped with a 100× 1.49 NA oil immersion objective divided by total events per kymograph. Velocity measurements were
(Nikon, Plano Apo), and a MLC400B laser launch (Agilent), with 405 nm, calculated from the inverse slopes of the motile event traces (>1 s and
488 nm, 561 nm and 640 nm laser lines. Excitation and emission paths >1 μm) only. Statistical analyses were performed in Prism8 (Graphpad).
were filtered using single bandpass filter cubes (Chroma), and emitted
signals were detected with an electron multiplying CCD camera (Andor Cell line
Technology, iXon Ultra 888). Illumination and image acquisition were HEK293T cells were purchased from ATCC (CRL-3216) and authenticated
controlled with NIS Elements Advanced Research software (Nikon), by ATCC. This cell line was tested for mycoplasma before expanding
and the xy position of the stage was controlled with a ProScan linear and freezing. After thawing each time, it was tested again. In addition,
motor stage controller (Prior). all cells grown in the laboratory are routinely tested for mycoplasma
Single-molecule motility were performed in flow chambers as every three months. The cells used in the experiments reported here
previously described25 using the setup shown in the schematic in were last tested on 10/16/19 and showed no contamination.
Fig. 4a. Biotin-PEG-functionalized coverslips (Microsurfaces) were
adhered to a Superfrost Plus Microscope slide (ThermoFisher) Western blot analysis
using double-sided scotch tape. Each slide contained four 293T cells were maintained in DMEM (containing 10% fetal bovine serum
flow-chambers. Taxol-stabilized microtubules (approximately 15 and 1% penicillin/streptomycin). For western blot quantification of
mg ml−1) with 10% biotin-tubulin and 10% Alex405-tubulin were pre- LRRK2 protein expression (Extended Data Fig. 9l), cells were plated on
pared as previously described25. For each motility experiment, 1 6-well dishes (150,000 cells per well) 24 h before transfection. Cells were
mg ml−1 Strepdavidin (in 30 mM HEPES, 2 mM magnesium acetate, transfected with 1 μg of GFP–I2020T using polyethylenimine (PEI, Poly-
1 mM EGTA, 10% glycerol) was incubated in the flow chamber for sciences). After 48 h, cells were treated for 30 min with either 5 μM GZD-
3 min. A 1:150 dilution of taxol-stabilized microtubules in motility 824 or DMSO-matched control. Cells were lysed on ice in RIPA buffer
assay buffer (30 mM HEPES, 50 mM potassium acetate, 2 mM magne- (50 mM Tris pH 7.5, 150 mM NaCl, 0.2% Triton X-100, 0.1% SDS, 0.5%
sium acetate, 1 mM EGTA, 10% glycerol, 1 mM DTT and 20 μM Taxol, pH Na-deoxycholate, with cOmplete protease inhibitor cocktail). Lysates
7.4) was added to the flow chamber for 3 min to adhere polymerized were further rotated for 15 min at 4 °C and clarified by centrifuging
at 13,000g for 15 min. Clarified supernatants were boiled for 5 min in of all cellular experiments comes from data collected on three separate
Laemmli buffer. The experiments were performed in triplicate. days except for the 10 μM GZD-824 condition in Fig. 5d, which was per-
For western blots, lysates were run on 4–12% gradient SDS–PAGE (Life formed on two separate days. All statistical analyses were performed
Technologies) for 60 min and transferred to nitrocellulose for 3 h at in Prism8 (Graphpad).
250 mA. Blots were dried at room temperature for 30 min, rinsed in 1×
TBS, followed by blocking with 5% milk in TBS. Antibodies were diluted Reporting summary
in 5% milk in TBS with 0.1% Tween-20 (TBS-T). Primary antibodies were Further information on research design is available in the Nature
incubated overnight at 4 °C and infrared secondary antibodies were Research Reporting Summary linked to this paper.
incubated at room temperature for 45 min. For quantification of LRRK2
expression levels, blots were imaged on an Odyssey CLx controlled by
Imaging Studio software (v.5.2). DMSO and GZD-824 conditions were Data availability
quantified in triplicate and normalized to a GAPDH loading control All reagents and data will be made available upon request. Model coor-
using Empiria Studio software (Li-COR). To ensure quantification was dinates for the LRRK2RCKW structure are deposited in the PDB as follows:
in the combined linear range for antibodies detecting both GFP–LRRK2 (1) PDB accession code 6VP6: LRRK2RCKW with the adjacent COR-B and
and GAPDH, a linear dilution series of lysates from cells expressing WD40 domains (from the trimer) used to optimize residues at those
GFP–LRRK2 was also quantified by infrared western blot. interfaces during refinement in Rosetta, with GDP-Mg2+ bound; (2)
PDB accession code 6VNO: the top 10 models for LRRK2RCKW without
Immunofluorescence, confocal microscopy and image analysis adjacent domains, with GDP-Mg2+ bound; (3) PDB accession code
The day before transfection, 293T cells were plated on acid-treated 6VP8: LRRK2RCKW with the adjacent COR-B and WD40 domains (from
coverslips (Bellco Glass) pre-coated with 100 μg ml−1 poly-d-lysine the trimer) used to optimize residues at those interfaces during refine-
(Sigma) and 4 μg ml−1 mouse laminin (ThermoFisher) in 24-well plates ment in Rosetta, no GDP-Mg2+; (4) PDB accession code 6VP7: the top 10
(35,000 cells per well). Cells were transfected with 500 ng plasmid of models for LRRK2RCKW without adjacent domains, no GDP-Mg2+ bound.
either pDEST53-GFP-LRRK2 or pDEST53-GFP-LRRK2(I2020T) using Cryo-EM maps for the different LRRK2RCKW structures are deposited
PEI. After 48–72 h, cells were incubated with either a kinase inhibitor at the EMDB as follows: (1) Electron Microscopy Data Bank (EMDB)
or DMSO-matched control (matched for time and concentration). accession code EMD-21250: this deposition contains both the 3.5 Å
For the type I inhibitor experiment, cells were incubated with DMSO map of LRRK2RCKW trimer (used to build the COR-B, kinase and WD40
or Mli-2 (500 nM) for 2 h. For the type II inhibitor, cells were incubated domains) and the 3.8 Å map of the signal-subtracted LRRK2RCKW trimer
with DMSO or GZD-824 (5 or 10 μM) for 30 min. Cells were quickly (used to build the ROC and COR-A domains); (2) EMDB accession code
washed once on ice with ice-cold PBS, and fixed with ice-cold 4% PFA, EMD-21306: 8.1 Å map of LRRK2RCKW monomer; (3) EMD accession code
90% methanol, 5 mM sodium bicarbonate for 10 min at −20 °C. After 21309: 9.5 Å map of COR-mediated LRRK2RCKW dimer in the absence of
fixation, the wells were immediately washed three times with ice-cold kinase ligand (apo); (4) EMDB accession code EMD-21310: 13.4 Å map
PBS. Blocking buffer (1% BSA, 5% normal goat serum, 0.3% Triton X-100 of WD40-mediated LRRK2RCKW dimer in the absence of kinase ligand
in PBS) was added for 1 h at room temperature. Primary antibodies (apo); (5) EMDB accession code EMD-21311: 9.0 Å map of COR-mediated
were diluted (1:500) in antibody dilution buffer (1% BSA, 0.1% Triton LRRK2RCKW dimer in the presence of MLi-2; (6) EMDB accession code
X-100 in PBS) and incubated overnight at 4 °C. After overnight incu- EMD-21312: 10.2 Å map of WD40-mediated LRRK2RCKW dimer in the pres-
bation, the wells were washed three times in PBS and incubated with ence of MLi-2. All other data that support the findings of this study are
secondary antibodies (1:500) in antibody dilution buffer for 1 h at room available from the corresponding authors upon reasonable request.
temperature. After secondary incubation, the wells were washed three
times in PBS, once in ddH2O and mounted using CitiFluor AF2 (EMS) 45. Wagner, T. et al. SPHIRE-crYOLO is a fast and accurate fully automated particle picker for
cryo-EM. Commun. Biol. 2, 218 (2018).
on Superfrost Plus Microscope slides (ThermoFisher). Coverslips were 46. Woehlke, G. et al. Microtubule interaction site of the kinesin motor. Cell 90, 207–216
sealed with nail polish and stored at 4 °C. (1997).
For the LRRK2 filament analysis in Fig. 5, experimenters were blinded 47. Suloway, C. et al. Automated molecular microscopy: the new Leginon system. J. Struct.
Biol. 151, 41–60 (2005).
to condition for both the imaging acquisition and analysis. Cells were 48. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for
imaged using a Nikon A1R HD confocal microscope with a LUN-V laser improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017).
engine (405 nm, 488 nm, 561 nm and 640 nm) and DU4 detector using 49. Lis, P. et al. Development of phospho-specific Rab protein antibodies to monitor in vivo
activity of the LRRK2 Parkinson’s disease kinase. Biochem. J. 475, 1–22 (2018).
bandpass and long-pass filters for each channel (450/50, 525/50, 595/50 50. Lander, G. C. et al. Appion: an integrated, database-driven pipeline to facilitate EM image
and 700/75). Slides were imaged on a Nikon Ti2 body using an Apo 60× processing. J. Struct. Biol. 166, 95–102 (2009).
1.49 NA objective. Image stacks were acquired in resonant scanning 51. Roseman, A. M. FindEM—a fast, efficient program for automatic selection of particles
from electron micrographs. J. Struct. Biol. 145, 91–99 (2004).
mode with bidirectional scanning and 4× line averaging and 1.2 airy 52. Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure
units. The lasers used were 405 nm, 488 nm and 561 nm. Illumination and determination in RELION-3. eLife 7, e42166 (2018).
image acquisition were controlled by NIS Elements Advanced Research 53. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid
unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017).
software (Nikon Instruments). ImageJ was used to quantify the percent- 54. Henderson, R. et al. Outcome of the first electron microscopy validation task force
age of cells with LRRK2 filaments. Maximum-intensity projections were meeting. Structure 20, 205–214 (2012).
generated from z-stack confocal images. Using the GFP immunofluo- 55. Scheres, S. H. W. & Chen, S. Prevention of overfitting in cryo-EM structure determination.
Nat. Methods 9, 853–854 (2012).
rescence signal, transfected cells were traced. Cells were scored for 56. Rosenthal, P. B. & Henderson, R. Optimal determination of particle orientation, absolute
the presence or absence of filaments using both the z-projection and hand, and contrast loss in single-particle electron cryomicroscopy. J. Mol. Biol. 333,
z-stack micrographs as a guide. The presence of filaments was scored 721–745 (2003).
57. Chen, S. et al. High-resolution noise substitution to measure overfitting and validate
if the cells had either (1) a GFP filament signal greater than 5 μm or (2) resolution in 3D structure determination by single particle electron cryomicroscopy.
bundles of filaments with at least two identifiable crosses. To calculate Ultramicroscopy 135, 24–35 (2013).
58. Wang, R. Y.-R. et al. Automated structure refinement of macromolecular assemblies from
the percentage cells with filaments, the number of cells with filaments
cryo-EM maps using Rosetta. eLife 5, 352 (2016).
was divided by the total number of transfected cells per technical rep- 59. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta
licate (defined as one 24-well coverslip). Approximately 20 cells were Crystallogr. D 60, 2126–2132 (2004).
60. Söding, J., Biegert, A. & Lupas, A. N. The HHpred interactive server for protein homology
quantified per replicate for each condition in Fig. 5c (DMSO versus
detection and structure prediction. Nucleic Acids Res. 33, W244–W248 (2005).
Mli-2) and between 40 and 100 cells were quantified per replicate for 61. Greggio, E. et al. The Parkinson’s disease kinase LRRK2 autophosphorylates its GTPase
each condition in Fig. 5d (DMSO versus GZD-824). The quantification domain at multiple sites. Biochem. Biophys. Res. Commun. 389, 449–454 (2009).
Article
62. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and S.L.R.-P. is an investigator of the Howard Hughes Medical institute and is also supported by
analysis. J. Comput. Chem. 25, 1605–1612 (2004). R01GM121772. A.E.L. is supported by R01GM107214. S.K. is grateful for support from the SGC, a
63. Charrier, J.-D. et al. Discovery and structure-activity relationship of 3-aminopyrid-2-ones registered charity that receives funds from AbbVie, Bayer Pharma AG, Boehringer Ingelheim,
as potent and selective interleukin-2 inducible T-cell kinase (Itk) inhibitors. J. Med. Chem. Canada Foundation for Innovation, Eshelman Institute for Innovation, Genome Canada,
54, 2341–2350 (2011). Innovative Medicines Initiative EUbOPEN (agreement No 875510), Janssen, Merck KGaA, MSD,
64. Nicholas, M. P., Rao, L. & Gennerich, A. An improved optical tweezers assay for measuring Ontario Ministry of Economic Development and Innovation, Pfizer, São Paulo Research
the force generation of single kinesin molecules. Methods Mol. Biol. 1136, 171–246 (2014). Foundation-FAPESP, Takeda, and the Wellcome, as well as Boehringer Ingelheim for funding
65. Htet, Z. M. et al. LIS1 promotes the formation of activated cytoplasmic dynein-1 initial structural studies of this project. Most of this work is described in the thesis67 by C.K.D.
complexes. Nat. Cell Biol. 22, 518–525 (2020).
66. Kendrick, A. A. et al. Hook3 is a scaffold for the opposite-polarity microtubule-based Author contributions C.K.D. collected and processed the cryo-EM data. J.S. performed the
motors cytoplasmic dynein-1 and KIF1C. J. Cell Biol. 218, 2982–3001 (2019). single-molecule and cellular assays with the help of O.D. S.M. designed the LRRK2RCKW construct
67. Deniston, C. K. Parkinson’s disease-linked LRRK2 structure and model for microtubule and purified the protein. C.K.D. and I.L. built the molecular model of LRRK2RCKW. D.M.S.
interaction. PhD thesis, Univ. California, San Diego (2020). performed the SEC–MALS and phosphorylation assays. M.M. collected and analysed the
LRRK2RCKW and microtubule cryo-EM data. R.W. and J.B. performed the cellular cryo-ET. A.K.S.
contributed to the structural analysis and provided advice on the selection of kinase inhibitors.
Acknowledgements We thank S. Taylor for her role in initiating this collaborative work, which S.K., E.V., S.L.R.-P. and A.E.L. directed and supervised the research. C.K.D., J.S., S.L.R.-P. and
was partially supported by multi-investigator grants from the Michael J. Fox Foundation: A.E.L. wrote the manuscript and S.M., D.S., M.M., O.D., A.K.S., S.K. and E.V. edited it.
grants: 11425 and 11425.02 (PI: S. Taylor) and 18321 (PIs: A.E.L. and S.L.R.-P.). We also thank the
UC San Diego Cryo-EM Facility, the Nikon Imaging Center at UC San Diego, where the confocal
microscopy was performed, the use of instruments at the Electron Imaging Center for Competing interests The authors declare no competing interests.
NanoMachines supported by NIH (1S10RR23057, 1S10OD018111, and 1U24GM116792), NSF (DBI-
1338135) and CNSI at UCLA; J. P. Gillies and A. Kendrick for technical support with protein Additional information
purifications, and A. Dickey for feedback on the manuscript. C.K.D. was initially supported by Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
the Molecular Biophysics Training Grant (NIH grant T32 GM008326) and subsequently by a 2673-2.
Predoctoral Fellowship from the Visible Molecular Cell Consortium and Center for Trans-scale Correspondence and requests for materials should be addressed to S.L.R.-P. or A.E.L.
Structural Biology (UC San Diego). D.S. is supported by an A. P. Giannini Foundation Peer review information Nature thanks Asa Abelovich, Henning Stahlberg and the other,
postdoctoral fellowship. A.K.S. receives salary and support from the Ludwig Institute for anonymous, reviewer(s) for their contribution to the peer review of this work.
Cancer Research. E.V. is supported by a NIH Director’s New Innovator Award DP2GM123494. Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Optimization of LRRK2 constructs and cryo-EM tuning of boundaries was performed. A Coomassie-stained SDS–PAGE gel
analysis of a LRRK2RCKW trimer. a, We systematically scanned domain shows systematic N-terminal truncations at the ROC domain resulting in the
boundaries (amino acid numbers of boundaries noted above domain names) to identification of a construct with the highest expression levels: amino acids
generate LRRK2 constructs that expressed well in baculovirus-infected insect 1327–2527 (red asterisk, ‘LRRK2RCKW’ here). c, A Coomassie-stained SDS–PAGE
cells and yielded stable and soluble protein. These attempts included gel of purified LRRK2RCKW after elution from an S200 gel filtration column. As
full-length LRRK2, the kinase domain alone or with the WD40 domain, and predicted by its primary structure, LRRK2RCKW runs at approximately 140 kDa.
other isolated domains. In this approach, only the GTPase domain on its own d, Electron micrograph of LRRK2RCKW. e, 2D class averages of the LRRK2RCKW
expressed well. Next, we gradually shortened LRRK2 from its amino terminus. trimer. f, 2D/3D classification scheme used to obtain the 3.5 Å structure of the
Red asterisks indicate constructs that were soluble. b, After identifying domain LRRK2RCKW trimer. g, h, Fourier shell correlations (from Cryosparc) (d) and
boundaries yielding constructs that expressed soluble protein, additional fine Euler angle distribution (e) for the LRRK2RCKW trimer.
Article

Extended Data Fig. 2 | Cryo-EM analysis of a signal-subtracted LRRK2RCKW between COR-B and the αC helix of the N-lobe of the kinase domain. j, Interface
trimer and map-to-model fit. a, Processing strategy used to obtain a 3.8 Å between the ROC and COR-B domains. R1441 and Y1699, two residues mutated
structure of LRRK2RCKW generated from a signal-subtracted trimer where only in Parkinson’s disease, are labelled. k, l, Two different views of the ROC and
one monomer contains the ROC and COR-A domains. This structure improved COR-A domains with GDP-Mg 2+ modelled into the density. Side chains were
the resolution of the ROC and COR-A domains relative to the full trimer omitted in these two panels, corresponding to the lowest-resolution parts of
(Extended Data Fig. 1). b–d, 2D class averages (b), Fourier shell correlations the map. m, Map-to-model FSC plots for the top-ranked LRRK2RCKW models,
(from Relion) (c), and Euler angle distribution (from Relion) (d) for the 3.8 Å with (left) or without (right) GDP-Mg 2+ (right) in the ROC domain.
resolution signal-subtracted LRRK2RCKW structure. e, Close-ups (f–l) of The 0.143 FSC values are reported in Supplementary Table 1. n, Size exclusion
different parts of the final model fit into the map. f, Section of the WD40 chromatography–multiple angle light scattering (SEC–MALS) analysis of
domain. g, C-terminal helix and its interface with the kinase domain. h, Active LRRK2RCKW under the conditions used for cryo-EM (Fig. 1). The table shows the
site of the kinase. Residues in the DYG motif are labelled. G2019, the site of a calculated molecular weights (MW) of LRRK2RCKW according to SEC standards
major PD-associated mutation (G2019S) and the last residue of the activation and MALS.
loop seen in our structure, is highlighted by a black rounded square. i, Interface
Article

Extended Data Fig. 3 | Comparisons between LRRK2 and other kinases and domain is shown in yellow. j, Front view of the LRRK2 kinase with the C-spine
modelling of the LRR into LRRK2RCKW. a, View of the LRRK2RCKW atomic model and R-spine residues coloured in grey and white, respectively. k, Close-up of
with COR-A, COR-B and kinase domains coloured. The N- and C-lobes of the the DYG motif and neighbouring R-spine residues. A putative hydrogen bond
kinase are labelled, as is the αC helix in the N-lobe. b, c, The FAK-FERM (PDB between Y2018 and the backbone carbonyl of I1933 is shown (O–O distance:
code 2J0J)17 (b) and CDK2-cyclin A (PDB code 2CCH)19 (c) complexes, shown in 2.7 Å). This interaction provides a structural explanation for the
the same orientation as the kinase in a. The αC helix of CDK2 is also labelled. hyperactivation of the kinase resulting from a Y2018F mutation38, which would
d, Same view as in a with only the kinase domain and the C-terminal helix release the activation loop. l, Crystal structure of the LRR–ROC–COR(A/B)
coloured. e, Rotated view of the LRRK2 kinase domain with the C-terminal helix domains from C. tepidum Roco (PDB code 6HLU)7. m, Homology model for
facing the viewer. f, g, CDKL3 (PDB code 3ZDU) (f) and RIPK2 (PDB code 4C8B)32 human LRR–ROC–COR(A/B) based on the C. tepidum Roco structure (from
(g) shown in the same orientation as the LRRK2 kinase in e, with alpha helices SWISS-MODEL). n, Chimeric model combining LRRK2RCKW and the homology
with the same general location as the LRRK2 C-terminal helix coloured in green. model for the LRR domain from m obtained by aligning their ROC–COR(A/B)
h, KSR2-MEK1 complex (PDB code 2Y4I), with the kinase oriented as in e (left) domains. o, p, Two views of the hybrid LRRK2 LRCKW model. q, Close-up showing
and after removing KSR2 for clarity (right). The alpha helix associated with the the proximity between the active site of the kinase (with the side chains of its
kinase is shown in green. i, HCK (PDB code 2HCK) in complex with its SH2 and DYG motif shown) and the S1292 autophosphorylation site on the LRR. The
SH3 domains with the kinase oriented as in e (left), and after removal of the SH2 close-up also highlights the proximity between N2081, a residue implicated in
and SH3 domains for clarity (right). A remaining alpha helix from the SH2 Crohn’s disease, and the LRR.
Article

Extended Data Fig. 4 | Comparison between LRRK2RCKW and integrative associated LRRK2 filaments 5. d, Two views of the same fitting shown in c,
models built into cryo-ET maps of LRRK2 filaments in cells and docking of displayed with a higher threshold for the map to highlight the fitting of the
LRRK2RCKW into those maps. a, Root-mean-square deviation (r.m.s.d.) WD40 β-propellers into the density. The white arrows point towards the holes
between the atomic model of LRRK2RCKW and each of the 1,167 integrative at the centre of the β-propellers densities. e, Four copies of LRRK2RCKW were
models previously generated5. r.m.s.d. values were calculated in Chimera62 docked into the cryo-ET map by aligning their WD40 domains to the docked
using 100% residue similarity and with pruning iterations turned off. r.m.s.d. WD40 dimer. f, Model containing the four aligned LRRK2RCKW. g–j, Modelling of
values are grouped into 53 clusters of related models (see ref. 5 for details), with the kinase-closed form of LRRK2RCKW. g, h, The structure of ITK bound to an
the mean and standard deviation shown whenever the cluster contains two or inhibitor (PDB code 3QGY)63, which is in a closed conformation, was aligned to
more models. Integrative models that gave the lowest, median and highest LRRK2RCKW using only the C-lobes of the two kinases. i, The N-terminal portion
r.m.s.d. values are shown. The models are coloured according to the per- of LRRK2RCKW, comprising ROC, COR-A, COR-B and the N-lobe of the kinase, was
residue r.m.s.d. with the atomic model of LRRK2RCKW. b, The WD40s in the aligned to ITK using only the N-lobes of the kinases. ROC, COR-A and COR-B
crystal structure of a dimer of the LRRK2 WD40 (PDB code: 6DLP)9 were were moved as a rigid body in this alignment. j, Kinase-closed model of
replaced with the WD40s from our cryo-EM structure of LRRK2RCKW. c, The LRRK2RCKW.
resulting dimer was fitted into the 14 Å cryo-ET map of cellular microtubule-
Article
Extended Data Fig. 5 | Ab initio models for cryo-EM of LRRK2RCKW dimers filtered to 30 Å resolution. e, Projections of the volumes in d shown in the same
and cryo-EM analysis of WD40- and COR-mediated dimers of LRRK2RCKW in order as their corresponding 2D class averages in b. f, Data processing strategy
the presence of the inhibitor MLi-2. a, An initial dataset was collected from a for obtaining cryo-EM structures of WD40- and COR-mediated dimers of
sample of LRRK2RCKW incubated in the presence of the kinase inhibitor MLi-2 LRRK2RCKW in the presence of the inhibitor MLi-2. The models used during this
and dimers were selected. b, Representative two-dimensional class averages processing (Methods) are those shown in d along with an additional linear
used for ab initio model building. c, Ab initio models with the structure of trimer (Methods) used for particle sorting.
LRRK2RCKW docked in. d, Volumes generated form the molecular models in b,
Article
Extended Data Fig. 6 | Cryo-EM analysis of a monomer and WD40- and two dimers shown in Fig. 3 are shown on the left but in orientations similar to
COR-mediated dimers of LRRK2RCKW in the absence of inhibitor (apo) and those represented by the 2D class averages shown here. For each class average,
dimerization of LRRK2RCKW outside the filaments. a, Data-processing a projection from the corresponding model in the best-matching orientation is
strategy for obtaining cryo-EM structures of a monomer and WD40- and COR- shown to its left. c, Two copies of the LRRK2RCKW structure were aligned to the
mediated dimers of LRRK2RCKW in the absence of inhibitor. The models used ROC–COR domains of the LRR–ROC–COR structure from the C. tepidum Roco
during the processing of the dimers (Methods) are those shown in Extended protein (PDB code 6HLU) to replicate the interface observed in the bacterial
Data Fig. 5d, along with an additional linear trimer (Methods) used for particle homologue in the context of the human protein. This panel shows a
sorting. The models used for processing of the monomer (Methods) were the comparison between the dimer modelled based on the C. tepidum LRR–ROC–
same dimer models as in Extended Data Fig. 5d (used for particle sorting) in COR structure and the dimer observed for LRRK2RCKW in this work. Although the
addition to a monomer model generated from our LRRK2RCKW model (used for bacterial structure shows a dimerization interface that involves the GTPase
refinement). b, Two-dimensional (2D) class averages of WD40- and COR- (ROC), LRRK2RCKW interacts exclusively through its COR-A and -B domains, with
mediated LRRK2RCKW dimers obtained in the absence of inhibitors (apo) or in the ROC domains located away from this interface. The two arrangements are
the presence of either ponatinib or MLi-2. The same molecular models of the shown schematically in cartoon form below the structures.
Article
Extended Data Fig. 7 | Properties of the microtubule-associated LRRK2RCKW into the MLi-2 WD40-mediated dimer map (c) (dark blue) and apo monomer
filaments. a, b, The LRRK2RCKW structure solved in this work (a) was split at the map (g) (light blue). The three structures were aligned using the C-lobes of
junction between the N- and C-lobes of the kinase domain (L1949-A1950) (b). their kinases and the WD40 domain. The superposition illustrates that the
c, Docking of the two halves of LRRK2RCKW into a cryo-EM map of a LRRK2RCKW docking into the apo map results in a structure very similar to that obtained
dimer solved in the presence of MLi-2. The dimer map is the same one shown in from the trimer (Fig. 1) and that the presence of MLi-2 leads to a closing of the
Fig. 3 and Extended Data Figs. 10 and 11. d, The model obtained in c was docked kinase. i, Molecular model of the microtubule-associated LRRK2RCKW filament
into cryo-EM maps of either WD40- or COR-mediated dimers obtained in the obtained by docking a fragment of a microtubule structure (PDB code 6O2S)
presence of MLi-2. e, Molecular models resulting from the docking in d. into the corresponding density in the sub-tomogram average (Fig. 2a). j, Same
f, Aligning, in alternating order, copies of the dimer models generated in d and view as in i with the models shown as surface representations coloured by their
e results in a right-handed filament with dimensions compatible with those of a Coulomb potential. k, l, ‘Peeling off’ of the structure shown in j, with the
microtubule, and its ROC domains pointing inwards (see Fig. 3g, h for more LRRK2RCKW filament seen from the perspective of the microtubule surface (k)
details). g, Docking of the two halves of LRRK2RCKW into a cryo-EM map of a and the microtubule surface seen from the perspective of the LRRK2RCKW
LRRK2RCKW monomer solved in the absence of inhibitor (apo). The map is the filament (l). Note that the acidic C-terminal tubulin tails are not ordered in the
one shown in Fig. 1g and Extended Data Fig. 6. h, Three-way comparison of microtubule structure and are therefore not included in the surface charge
LRRK2RCKW (with domain colours) and the models resulting from the dockings distributions. The Coulomb potential colouring scale is shown on the right.
Extended Data Fig. 8 | Inhibition of motor motility by wild-type and I2020T I2020T mutant LRRK2RCKW. Data are mean ± s.d. (n = 12 microtubules per
mutant LRRK2RCKW. a, Example kymographs showing that increasing condition quantified from two independent experiments). There is a
concentrations of LRRK2RCKW reduce kinesin runs. b, Example kymographs significant difference between 0 nM and both 25 nM RCKW conditions
showing that 25 nM LRRK2RCKW reduces dynein runs. c, Representative (P < 0.0001), but no significant (ns) difference between the inhibitory effects of
kymographs of kinesin motility in the presence or absence of wild-type and wild-type LRRK2RCKW versus I2020T mutant LRRK2RCKW as calculated using the
I2020T mutant LRRK2RCKW. d, The percentage of motile kinesin events per Kruskal–Wallis test with Dunn’s posthoc for multiple comparisons (compared
microtubule in the absence of LRRK2 or in the presence of 25 nM wild-type or to no LRRK2RCKW).
Article

Extended Data Fig. 9 | Type II kinase inhibitors rescue kinesin and dynein type II inhibitors with or without LRRK2RCKW. j, The type II kinase inhibitors
motility. a–e, Ponatinib is a type II, ‘DFG out’ inhibitor. a, Superposition of the ponatinib and GZD-824 rescue kinesin run length, represented as a cumulative
structures of Ponatinib-bound RIPK2 (PDB code 4C8B)32 and IRAK4 (PDB code frequency distribution of run lengths with LRRK2RCKW (25 nM) or without
6EG9). Ponatinib is shown in yellow, and the DYG motif residues are shown in LRRK2RCKW. From top to bottom: n = 893, 355, 507, 499, 524 and 529 runs from
white. b, c For comparison, the structures of Roco4 bound to LRRK2-IN-1 (PDB two independent experiments. Mean decay constants (tau) ± 95% confidence
code 4YZM)35, a LRRK2-specific type I, ‘DFG in’ inhibitor (b), and a model of intervals are (from top to bottom) 2.070 ± 0.058, 0.8466 ± 0.091, 1.938 ± 0.065,
MAPK1 bound to MLi-2 (PDB code 5U6I)22, another LRRK2-specific type I, ‘DFG 2.075 ± 0.07, 1.898 ± 0.065 and 1.718 ± 0.064. Data were resampled with
in’ inhibitor (c) are shown. The inhibitor and DFG residues are coloured as bootstrapping analysis and statistical significance was established using a one-
in a. d, The structures in a–c, as well as the kinase from LRRK2RCKW are shown way ANOVA with Dunnett’s test for multiple comparisons. DMSO run lengths
superimposed. The colour arrowheads point to the N-lobe β-sheet to highlight were significantly different (P < 0.0001) between conditions (0 vs 25 nM
the difference in conformation between kinases bound to the two different RCKW). Ponatinib (0 vs 25 nM RCKW) and GZD-824 (0 vs 25 nM LRRK2) were not
types of inhibitors. Note that the LRRK2RCKW kinase is even more open than the significant. k, As in j but with dynein. From top to bottom: n = 659, 28, 289, 306,
two ponatinib-bound kinases. e, Rotated view of d, now highlighting the 254 and 339 runs from two independent experiments. Mean decay constants
position of the N-lobe αC helix. An additional alpha helix in the N-lobe of MAPK1 (tau) ± 95% confidence intervals; micrometres are 4.980 ± 0.147, 0.846 ± 0.415,
was removed from this view for clarity. f, The kinase inhibitors MLi-2 (1 μM), 4.686 ± 0.142, 4.445 ± 0.172, 3.156 ± 0.09, 3.432 ± 0.188 (from top to bottom).
LRRK2-IN-1 (1 μM), ponatinib (10 μM) and GZD-824 (10 μM) all inhibit the Statistical significance as in j and run lengths were significantly different
LRRK2RCKW kinase activity in vitro compared to a DMSO control. A western blot (P < 0.0001) between DMSO conditions (0 vs 25 nM RCKW), and not significant
using a phospho-specific antibody to Rab8A at the indicated time points is for ponatinib or GZD0824 conditions. The DMSO conditions are reproduced
shown. g, A dose–response curve showing the percentage of motile kinesin from Fig. 4f for comparison. l, Expression levels of GFP-LRRK2 (I2020T) in 293T
events per microtubule as a function of ponatinib concentration with cells treated with either DMSO or GZD-824 (5 μM). An immunoblot with anti-
LRRK2RCKW (25 nM) or without LRRK2RCKW. Data are mean ± s.d. (from left to GFP (LRRK2) and anti-GADPH (loading control), which is a representative
right: n = 12, 18, 16, 14 and 9 microtubules quantified from one experiment). image from three replicates, is shown. m, Quantification of GFP–LRRK2
****P < 0.0001, Kruskal–Wallis test with Dunn’s posthoc for multiple (I2020T) expression levels from western blots similar to l. Data are mean ± s.d.
comparisons, compared to DMSO without LRRK2RCKW. h, Dose–response curve (n = 3 per condition). GZD-824 is not significantly different from the DMSO-
of run lengths from data in g represented as a cumulative frequency treated control (Mann–Whitney test). n, 293T cells immunostained for tubulin
distribution. From top to bottom: n = 654, 173, 584, 293 and 129 motile kinesin showing that the microtubule architecture is not affected by GZD-824 or
events. Mean decay constants (tau) ± confidence interval are (from top to MLi-2 compared to DMSO treatment. See Supplementary Table 1 for all source
bottom) 2.736 ± 0.113, 1.291 ± 0.181, 2.542 ± 0.124, 2.285 ± 0.134, and data and replicate information.
1.653 ± 0.17. i, Representative kymographs of kinesin and dynein with DMSO or
Article
Extended Data Table 1 | Cryo-EM data collection and model refinement statistics
The model refinement statistics are reported for four different types of model, two including GDP-Mg2+ in the ROC domain and two excluding it. In each case, we report statistics for two types
of model: ‘Monomer w/interfaces’ consists of an LRRK2RCKW monomer plus fragments from the neighbouring monomers in the C3 trimer that were used during model building and refinement;
‘Top 10 monomers’ are the top-10 results from Rosetta Relax with the neighbouring fragments removed after processing in Rosetta. PDB accession numbers for the models and the EMD code
of the maps used for model-building and refinement are indicated. EMD-21250 contains both the C3 map of the LRRK2RCKW trimer used to build the COR-B, kinase and WD40 domains and the
signal-subtracted monomer used to build the ROC and COR-A domains. The final models reported here were refined into the signal-subtracted monomer map (Methods) *C3 reconstruction.
#
Signal-subtracted monomer.
a
WD40-mediated dimer.
b
COR-mediated dimer.
**Numbers represent the average of the values for all 10 models.
Samara L Reck-Peterson and Andres E
Corresponding author(s): Leschziner
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection For electron microscopy experiments, data was collected with Leginon. All the electron microscopy data collection software sources are
referenced in the methods section. For light microscopy experiments, data was collected with Nikon Elements Software (commercially
available). For Western blot, data was collected using Image Studio v5.2 (Li-COR).
Data analysis For electron microscopy experiments, data was processed with Appion, GCTTF, MotionCor2, FindEM, Cryolo, Relion3, and Cryosparc2. All
the electron microscopy data processing software sources are referenced in the methods section. For light microscopy experiments, data
was analyzed with ImageJ and used to make image z-maximum projections and kymographs. Graphpad Prism8 were used for all
statistical analysis of light microscopy data. For Western blots, data was quantified using EmpiriaStudio software (Li-COR).
Data
October 2018

For electron microscopy experiments, maps and coordinates are deposited on the PDB and EMDB. No image sets or particle stacks will be made available. All the
raw data that went into the biochemical and cell biological analyses for Figures 4 and 5 (and associated Extended Data Figures) were deposited in a spreadsheet
with the manuscript.
1

Sample size For all experiments, we determined the sample size by following conventions in the field.
Data exclusions For single molecule kinesin experiments, clear bright aggregates (less than 5% of runs) were excluded from the analysis, as these runs display
longer run lengths than typical single-molecule kinesin runs (Brouhard, 2010, Methods Cell Biol). No conclusions change with the addition or
exclusion of these aggegrates, and we would be happy to provide the data without exclusion of aggregates if deemed necessary.
Replication All single molecule experiments in Figure 4 and 5 (including dynein and kinesin data with or without drugs) were performed with at least two,
but up to four technical replicates on different days (except EDF10g,h that was only performed with one technical replicate). Major findings
with kinesin and dynein single molecule data have been confirmed by two different protein preps. All cellular data from Figure 5 was
quantified from at least four, but up to ten technical replicates (defined in the Methods as at least 20 cells per coverslip) and independent
experiments were performed on multiple days as outlined in the Methods section.
Randomization This is not relevant. We have no data involving organisms or subjects that would require randomization.
Blinding For the cell biology data in Fig 5c-f, experimenter was blinded to conditions for both the imaging acquisition and analysis of LRRK2 filaments.


Antibodies ChIP-seq
Clinical data
Antibodies
Antibodies used mouse anti-GFP (Santa Cruz, clone: B-2, Cat: sc-9996, Lot: ); chicken anti-GFP (AvesLabs, Cat: GFP-1010, Lot: GFP879484); rabbit
anti-alpha-tubulin (ProteinTech, Cat: 11224-1-AP); mouse anti-GAPDH (ProteinTech, Cat: 60004-1-Ig)
Validation All antibodies used are well-validated and highly-specific commercially available antibodies. For LiCOR quantification, linear

Cell line source(s) HEK293T used were from ATCC (CRL-3216)
October 2018
Authentication ATCC authenticated
Mycoplasma contamination Every new cell line we receive is tested for mycoplasma before expanding and freezing. After thawing, each cell line is tested
again. Once every three months, our lab tests all growing cells for mycoplasma as well. The cells we used in our experiments
were last test on 10/16/19 and did not contain contamination.
Commonly misidentified lines None

2
Article
Structures and pH-sensing mechanism of

the proton-activated chloride channel
https://doi.org/10.1038/s41586-020-2875-7 Zheng Ruan1,4, James Osei-Owusu2,4, Juan Du1, Zhaozhu Qiu2,3 ✉ & Wei Lü1 ✉
Accepted: 14 August 2020 The proton-activated chloride channel (PAC) is active across a wide range of
Published online: 4 November 2020 mammalian cells and is involved in acid-induced cell death and tissue injury1–3. PAC
has recently been shown to represent a novel and evolutionarily conserved protein
Check for updates
family4,5. Here we present two cryo-electron microscopy structures of human PAC in
a high-pH resting closed state and a low-pH proton-bound non-conducting state.
PAC is a trimer in which each subunit consists of a transmembrane domain (TMD),
which is formed of two helices (TM1 and TM2), and an extracellular domain (ECD).
Upon a decrease of pH from 8 to 4, we observed marked conformational changes in
the ECD–TMD interface and the TMD. The rearrangement of the ECD–TMD interface
is characterized by the movement of the histidine 98 residue, which is, after
acidification, decoupled from the resting position and inserted into an acidic pocket
that is about 5 Å away. Within the TMD, TM1 undergoes a rotational movement,
switching its interaction partner from its cognate TM2 to the adjacent TM2. The
anion selectivity of PAC is determined by the positively charged lysine 319 residue
on TM2, and replacing lysine 319 with a glutamate residue converts PAC to a
cation-selective channel. Our data provide a glimpse of the molecular assembly of
PAC, and a basis for understanding the mechanism of proton-dependent activation.
Acidic pH is crucial for the function of intracellular organelles in the cryo-EM structures of PAC reconstituted in lipid nanodiscs at pH 8
secretory and endocytic pathways. It is also one of the pathological (pH8-PAC) and pH 4 (pH4-PAC) with estimated resolutions of 3.60
hallmarks of many diseases, including cerebral and cardiac ischaemia, and 3.73 Å, respectively (Extended Data Figs. 1a–c, 2, 3, Extended Data
cancer, infection and inflammation. The activity of PAC is stimulated by Table 1). The maps were of sufficient quality to carry out de novo model
the lowering of extracellular pH and has been recorded in a wide range building of the protein (Fig. 1, Extended Data Fig. 4a–d, Extended Data
of mammalian cells1. By mediating the influx of Cl− and subsequent cell Table 1). The cytoplasmic N and C termini (residues 1–60 and 339–350
swelling, PAC is implicated in acid-induced cell death2,3. We and others in pH8-PAC and 1–52 and 340–350 in pH4-PAC) are disordered in our
recently used unbiased RNA interference screens4,5 to identify a novel cryo-EM maps.
gene, PACC1 (also known as TMEM206), that encodes the PAC channel.
Our functional studies revealed that PAC has a key role in acid-induced
neuronal cell death in vitro and in ischaemic brain injury in mice4,6. Overall architecture
With no obvious sequence homology to other membrane proteins, PAC is a trimer. It has a small, ball-shaped ECD sitting on top of a
PAC represents a completely new family of ion channels4,5. PAC is highly slim and elongated TMD that contains two transmembrane heli-
conserved across vertebrates and is predicted to have two transmem- ces (TM1 and TM2) in each subunit (Fig. 1a, b, e, f). This trimeric
brane helices4,5, similar to the acid-sensing ion channel (ASIC) and the two-transmembrane-helix architecture is reminiscent of ASIC
epithelial sodium channel (ENaC)7,8. Although the structure and func- (Extended Data Fig. 5a–e) and ENaC7,8. The ECD of PAC is heavily gly-
tion of ASIC have been extensively studied7,9–12, the architecture of PAC cosylated, with four N-glycosylation sites in each subunit (Fig. 1c, g)—
and the mechanisms that underlie its pH sensing and anion selectivity consistent with a previous report5 and a deglycosylation assay
are unknown. To address these questions, we determined structures of (Extended Data Fig. 1d).
human PAC using single-particle cryo-electron microscopy (cryo-EM) Alkaline and acidic pH yielded two PAC structures with distinct
combined with patch-clamp electrophysiological studies. shapes—pH4-PAC is shorter and bulkier than pH8-PAC, and they differ
mainly at the TMD and the ECD–TMD interface. At pH 8, the TM1 helix
runs nearly parallel to and forms interactions only with its cognate TM2
Structural determination (Fig. 1b, d). When the pH drops to 4, TM1 switches its interaction from
PAC is activated at a pH below 5.5 at room temperature, and is maxi- its cognate TM2 to the adjacent TM2 (Fig. 1f, h). This domain-swapped
mally stimulated by protons at around pH 4.6–41. We determined movement of TM1 has not, to our knowledge, been observed in any
Department of Structural Biology, Van Andel Institute, Grand Rapids, MI, USA. 2Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
1
3
Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA. 4These authors contributed equally: Zheng Ruan, James Osei-Owusu.
✉e-mail: zhaozhu@jhmi.edu; wei.lu@vai.org

a b 90° c 180° d
Disulfide
bond TM1 TM2
M2
2
ECD
N TM1
TM2 N
TM1 TM2
18.4 Å ECD–TMD 57.7 Å
interface NAG C
TM1–β1
linker H98 (N148) NAG NAG
(N190) (N162)
β14–TM2 TM1 10.4 Å
NAG
TMD linker TM2 (N155)
27.8 Å ECD 30.2 Å TM2 TM1
C N
e f 90° g 180° h
Disulfide
bond TM2 TM1
N
TM1
TM2 C
TM2
28.9 Å
H98 50.6 Å TM1
TM1
TM1–β1
Pre-TM2
linker TM2
TM2
TM1
25.9 Å ECD 8.9 Å
30.8 Å
N
Fig. 1 | Overall architecture of PAC. a, e, Cryo-EM maps of pH8-PAC and glycosylation sites (Asn148, Asn155, Asn162 and Asn190) are labelled in c. The
pH4-PAC viewed parallel to the membrane. The map refined without using a distances between the centre of mass of the ECDs of each subunit in pH8-PAC
mask is shown as a transparent envelope. The horizontal dimension of the ECD– (c) and pH4-PAC (g) are shown by the triangles. NAG, N-acetyl-d-glucosamine.
TMD interface is represented by the distance between the Cα atoms of adjacent d, h, The TMDs of pH8-PAC and pH4-PAC viewed from the intracellular side. A
His98 residues. The density for His98 is coloured in yellow in the pH8-PAC (a) light salmon arrow in d indicates the rotation of TM1 of PAC after acidification
and pH4-PAC (e) maps. b, f, Atomic models of pH8-PAC and pH4-PAC. The green to pH 4. The relative position and distance of TM1 and TM2 in pH8-PAC (d) and
subunit is shown as a cartoon and the other two subunits are shown in surface pH4-PAC (h), which are represented by the Cα atoms of Iso73 and Lys319,
representation. The distances between the centre of mass of the ECD and of the respectively, are shown at the bottom. The double-headed arrow indicates the
TMD in pH8-PAC (b) and pH4-PAC (f) are shown on the right. c, g, The ECDs of interaction between TM1 and TM2.
pH8-PAC and pH4-PAC viewed from the extracellular side. Four putative
other two-transmembrane-helix channels,12,13 implying a novel gating structure that occupies the peripheral region of the ECD. The con-
mechanism. nection between the ECD and the TMD is achieved by the TM1–β1 and
The ECD–TMD interface consists of part of the TM1 helix on the extra- β14–TM2 linkers, which together form the wrist domain.
cellular side and two linkers that connect the TMD and the ECD—the The TMD consists of two transmembrane helices, TM1 and TM2, at the
TM1–β1 linker and the β14–TM2 linker. This interface differs substan- N terminus and C terminus of the protein, respectively. TM1 contains
tially between the two PAC structures (Fig. 1b, f). At pH 8, TM1 and the mostly hydrophobic residues and makes direct contacts with the lipid
short TM1–β1 linker hold the adjacent ECD through an ‘anchor’ residue, bilayer. TM2 contains both hydrophilic and hydrophobic residues and
His98, while the β14–TM2 linker is extended as a loop close to the pore lines the ion-conducting pore. Although the ECD mostly maintains
axis (Fig. 1b). At pH 4, the β14–TM2 linker is remodelled into a short its conformation in both pH states, the ECD–TMD interface and TMD
pre-TM2 helix, and the TM1–β1 linker moves outward, causing a verti- differ substantially at pH 4 and pH 8, characterized by the distinct con-
cal compression and an expansion of the ECD–TMD interface (Fig. 1a, formations of His98 and TM1 (Extended Data Fig. 4e, f). At pH 8, TM1
b, e, f, Supplementary Video 1). In tandem with the rearrangement of is approximately parallel to TM2, whereas at pH 4, the two transmem-
the ECD–TMD interface, the ECD at pH 4 shows a vertical movement brane helices form an angle of 64°.
towards the TMD and a contraction towards the pore axis, resulting in The TM2 helix of PAC is a continuous α-helix and differs from the
a shorter overall structure and a more compact ECD in comparison to ASIC TM2, which has a characteristic two-segment structure and a
that at pH 8 (Fig. 1b, c, f, g). Gly-Ala-Ser belt10 (Extended Data Fig. 5b, e). The ECD of PAC shows
strong similarities to the β-sheet core of the ECD in ASIC, despite sharing
limited protein-sequence identity (Extended Data Figs. 5e, 6). Notably,
Structure of single subunits the PAC ECD lacks the large exterior helical structures of the ASIC ECD,
At pH 8, each PAC protomer adopts an arm-like structure, with the which are involved in pH sensing14 (Extended Data Figs. 5a–d, 6), so PAC
ECD as the hand, the ECD–TMD interface as the wrist and the TMD as must have a different pH-sensing mechanism.
the forearm (Extended Data Fig. 4e). The hand-like ECD is composed
of a palm, a finger, a thumb and a β-ball domain, all of which consist
of β-strands except for the thumb domain, which contains two short Channel assembly
α-helices (Extended Data Fig. 4e–g). The finger and the β-ball domains The major interactions between PAC subunits occur at the ECD, the
are connected by a disulfide bond (Cys128–Cys149), forming a rigid ECD–TMD interface and in the upper part of the TMD. The lower part

Article
a b c d e
Upper Palm
β11 D91 N302 TM1
ECD β14 β10 N305
β1 R87
Lower β1 Q296 W304
ECD β1 β14 K248
Finger S102 F83 L309 A308
ECD–TMD
interface I298 G312 C311
H98
M101 A316 L315
TM1
TMD β12 F318
F196 β14–TM2 K319
β8 αA αB TM1
linker TM1 TM2 TM2
M101
β-ball β1 V103
TM1–TM2 TM2–TM2
interface interface
f Upper
g h i j
Acidic β1–β2 linker
ECD pocket
D109
Lower β10
ECD β1 β11
β14 TM1 TM2
E107
ECD–TMD
L75
interface E250 β1 TM2
I71 TM1
V103 H98 Y74
β14 β1 A320
TMD β12 L70 A321
F196 M101
αA αB
β8 V103 Pre-TM2 A324
β1 TM1
TM1 TM2 S327
Fig. 2 | Intersubunit interfaces. Each row represents the same view of from the extracellular side. At pH 8, Met101 at the beginning of the β1 strand is
pH8-PAC (top row; a–e) and pH4-PAC (bottom row; f–j). a, f, The overall in the centre of the lower ECD (c, bottom right). At pH 4, the lower ECD
structure of PAC shown in cartoon and surface representation. The ECD is undergoes a clockwise inward rotation so that Val103 in the middle of the β1
divided into the upper ECD and lower ECD for discussion. b, g, The upper ECD strand moves to the centre of the lower ECD (h, bottom right). d, i, The ECD–
viewed from the extracellular side. Phe196, which mediates the intersubunit TMD interface viewed parallel to the membrane. e, j, The interaction interfaces
interaction in the upper ECD, is shown as spheres. c, h, The lower ECD viewed at the TMD.
of the TMD lacks extensive interactions and is thus flexible. Analysis interface remains mostly unchanged, but the TM1–TM2 interface slips
of the conformational changes at the ECD revealed a rigid-body con- towards the intracellular side as a result of the domain-swapped move-
traction of the entire ECD and an iris-like rotation of the lower ECD ment of TM1.
(Supplementary Video 1). We thus looked at both the upper ECD (finger
and β-ball domains) and the lower ECD (palm and thumb domains) to
study their intersubunit interfaces (Fig. 2a, f). Ion-conducting pathway and selectivity
At pH 8, both the upper and the lower ECD have loose intersubu- The PAC channel has a central pore along the symmetry axis with wide
nit interfaces with an obvious gap between subunits (Fig. 2b, c). The openings at the extracellular and intracellular ends in both pH states
upper ECD has two major intersubunit interactions. The first is formed (Fig. 3a–d). Within the TMD, the ion-conducting pore is lined by TM2
between the adjacent β8 and β12 strands that run approximately paral- and the β14–TM2 linker (Fig. 3b, d). At pH 8, the ion-conducting pore is
lel to each other. The second is formed between the β6–β7 linker and occluded at several positions (Fig. 3e, g), thus representing a high-pH
the adjacent finger domain, where the Phe196 residue on the β6–β7 resting closed state. At pH 4 (Fig. 3f, g), the intracellular part of the pore
linker is inserted into the finger domain, forming both hydrophobic has an enlarged radius of 0.82 Å, but is still not wide enough to allow
and cation–π interactions (Fig. 2b, g, Extended Data Fig. 1e). Substitu- the permeation of Cl− ions, thus representing a low-pH protonated
tion of Phe196 with an alanine residue (F196A) yielded a misassembled non-conducting state. We found that PAC exhibits a strong outward
mutant protein that exhibited a markedly decreased channel activity rectification such that either the open probability or the single-channel
compared to the wild type (Extended Data Fig. 1f, g). This suggests that conductance is low at 0 mV, the voltage at which the cryo-EM structures
the upper ECD has an important role in channel assembly. The lower were determined (Fig. 3h). Moreover, PAC showed a marked desensiti-
ECD has a single major interface at the centre, where the three Met101 zation after prolonged treatment with a pH 4 solution (Extended Data
residues on the N terminus of the β1 strand tightly interact with each Fig. 7a–f). Therefore, we suggest that a closed pore in the pH4-PAC
other. This interface disconnects the central pore from the ECD to the structure represents either a pre-open state or a desensitized state.
TMD (Fig. 2c). At pH 4, the gaps in both the upper and the lower ECDs are To reveal the molecular determinants that are responsible for the
mostly filled, creating extensive interactions between subunits (Fig. 2g, anion selectivity of PAC, we examined the positively charged residues
h). Moreover, in the centre of the lower ECD, owing to the iris-like rota- within the ion-conducting pore, all of which are located in the intracel-
tion of the ECD from pH 8 to pH 4, the Val103 residue in the middle of lular half of the TM2 helix. The Lys319 residue appears to be an ideal
the β1 strand now mediates the contact (Fig. 2h). candidate, because it is immediately below the intracellular restric-
The intersubunit contact at the ECD–TMD interface is mediated tion site (Leu315) and forms a positively charged ‘triad’ around the
through His98 at the TM1–β1 linker. At pH 8, His98 is surrounded by intracellular entry point (Fig. 3e, f). In line with this hypothesis, we
hydrophilic and hydrophobic residues of the β1 and β14 strands of the found that a charge-reversing mutation (K319E) converted PAC from
adjacent ECD, constituting a resting ECD–TMD interface (Fig. 2d). At an anion-selective to a cation-selective channel with a pronounced
pH 4, this interface is remodelled; His98 interacts with a pocket formed inwardly rectifying current (Fig. 3h–j, Extended Data Fig. 8a, b). By con-
by residues in the β10–β11 linker of its cognate ECD and in the β1–β2 trast, mutation of two other lysine residues, K325E and K329E, resulted
linker of the adjacent ECD (Fig. 2i). Because this pocket is constructed in mutant proteins that behaved similarly to wild-type PAC (Extended
solely of negatively charged residues, we call it an ‘acidic pocket’. Data Fig. 8a–e). The crucial role of Lys319 is further supported by its
At the TMD, PAC has two major intersubunit interfaces (Fig. 2e, j): the conservation across species (Extended Data Fig. 6), and by the fact that
TM1–TM2 interface and the TM2–TM2 interface. At pH 8, both inter- PAC(K319C) is not functional5. Together, our data provide evidence that
faces are near the extracellular part of the TMD. At pH 4, the TM2–TM2 Lys319 is the determinant of anion selectivity for PAC.

a b c d a 3Å b c
E107′
E107’
D109′
D109’
β12 E250′
E250’ 4.6 Å
β12 Palm Q296
Vestibule H98’
H98′ TM1′
TM1’
Vestibule
β14 H98′ H98 TM2′
TM2’
Fenestration β14 Acidic
Fenestration H98 TM1
β1 β1 pocket Resting
β14–TM2 β14–TM2 interface TM2
TM1’
TM1′ TM2
linker linker TM1’
TM1′ TM1 22.8°
K319 TM2′ TM1
TM2’ 47.4°
TM2 TM2
d e 6.5 2.0 × 10–16

1.0 WT
7.9 × 10–4
H98R
Normalized current
e f g H98A 1.2 × 10–7
6.0
M101 ECD–TMD ECD–TMD
ECD–TMD ECD–TMD 6.6 × 10–8
V103 Q296A
seal
seal seal
Distance along pore axis (Å)

seal
pH50
V100 0.5 E107R 5.5
Fenestration
T300 Fenestration N302
Upper
Upper 5.0
N305 gate
gate
N305 0 4.5
A308 7.0 6.0 5.0 4.0 WT H98R H98A Q296A E107R
L309 G312 pH
G312 Lower
L315 gate Fig. 4 | Mechanisms of pH sensing and channel activation. a, Superposition
L315 pH 8
K319 of a single subunit of pH8-PAC (blue) and pH4-PAC (red) aligned using the ECD
10 Å pH 4
K319
palm domain. The 3 Å centre-of-mass distance indicates the rigid-body
1 2 3 4
Pore radius (Å) movement of the ECD. b, Close-up view of the conformational change in the
WT K319E
j ECD–TMD interface in a. Structural elements and residues in the pH4-PAC
h 2 1 i WT K319E 6.1 0.05
I (nA)
8 structure are labelled with a prime symbol. Residues from adjacent subunits
–100 30
I (nA)
1 are coloured in bright and light colours, respectively. At pH 8, His98 interacts
PCl/PNa
Vrev (mV)
100
0 with Gln296. At pH 4, the side chain of His98 interacts with an acidic pocket.
–100 V (mV) 4
–1
100 –30 c, Comparison of the TMD viewed from the intracellular side. The structures of
V (mV) 150 mM
–1 –2 15 mM –60
pH8-PAC (blue) and pH4-PAC (red) are aligned using the ECD. d, pH dose–
WT K319E
response curve of wild-type PAC and various PAC mutants. Data are
Fig. 3 | Ion-conducting pathways and anion selectivity. a, c, pH8-PAC (a) and mean ± s.e.m. of the current at 100 mV, normalized to pH-4.6-induced current
pH4-PAC (c) in surface representation, coloured according to the electrostatic (n = 10 (wild type, H98R, E107R), n = 9 (H98A) and n = 11 (Q296A)). The Hill
surface potential from −3 to 3 kT/e (red to blue). Titratable residues are coefficients for wild-type PAC and PAC(E107R) are 2.44 ± 0.18 and 1.18 ± 0.19
assigned to their predominant protonation state at pH 8 (a) or pH 4 (c) based on (mean ± s.e.m.), respectively. e, pH50 value estimated from the pH dose–
PROPKA. b, d, The pore profiles of pH8-PAC (b) and pH4-PAC (d) models along response curve. The centre and error bar represent the estimated pH50 value
the symmetry axis. Pore-lining residues are shown. e, f, Enlargements of the and s.e.m. from the nonlinear fitting in d. A one-way analysis of variance
boxed areas in b (e) and d (f), respectively. The positions of the ECD–TMD seal (ANOVA) with Bonferroni post-hoc test was used to determine the significance
and fenestration site are labelled. g, Pore radius plots of the profiles in e, f. (P values are indicated).
h, The representative current (I)–voltage (V) relationship for wild-type (WT)
PAC and PAC(K319E). The pipette solution contains 150 mM NaCl; the bath
solution contains 150 mM (black) or 15 mM (red) NaCl. i, The reversal potential the pH4-PAC structure are hydrated, whereas those in the pH8-PAC
(Vrev) of wild-type PAC and PAC(K319E) from recordings in h (n = 16 (wild type) structure are less accessible to solvent (Extended Data Fig. 8h, k). Our
and n = 7 (K319E)). Data are mean ± s.e.m. Individual data points are shown as
data suggest that the lateral fenestrations could be an extracellular
dots. j, The relative Cl−/Na+ permeability (PCl/PNa) for wild-type PAC (n = 15) and
ion-entry point that is common to two-transmembrane-helix chan-
PAC(K319E) (n = 11) calculated from current induced at pH 5 and 100 mV. Data
nels9,13,15. This agrees with a previous report in which treatment with
are mean ± s.e.m. of the permeability ratio. Individual data points are shown as
solid dots. The average PCl/PNa permeability values are indicated at the top for
a thiol-reactive reagent, MTSES, partially inhibited the ion-channel
each construct. activity of PAC when Thr306—which is part of the fenestration—was
replaced by a cysteine residue5.
In the ECD, the pore along the symmetry axis has a large vestibule in
the middle (Fig. 3b, d). This vestibule is constricted at the ECD–TMD Mechanisms of pH sensing and channel activation
interface by an ECD–TMD seal in both the pH8-PAC and the pH4-PAC To elucidate the pH-sensing mechanism of PAC, we compared the
structures (Fig. 3e–g, Extended Data Fig. 8f, i). This leads to the question structures of pH8-PAC and pH4-PAC. We focused on the ECD and the
of how ions might enter the ion-conducting pore from the extracellular ECD–TMD interface, because PAC is activated by extracellular acid.
side. Just below the seal, we observed three lateral fenestrations that Superimposing a single subunit revealed that, upon protonation, the
connect to the central pore. Fenestrations at similar locations have been major motion of the extracellular region occurred at the ECD–TMD
defined as an ion-entry point in both the ASIC and P2X channels9,13,15. At interface, whereas the ECD showed minor rigid-body movement
pH 8, the fenestration in PAC is formed by the extracellular portion of (Fig. 4a). This suggests that the ECD–TMD interface probably par-
the TM1 helix and the β14–TM2 linker of the adjacent subunit (Extended ticipates in pH sensing. We hypothesized that the His98 residue in the
Data Fig. 8f). The entrance is surrounded by several negatively charged TM1–β1 linker is one of the key pH sensors, because it showed a large
residues, making it unfavourable for conducting anions (Extended Data movement from the high-pH resting state to the low-pH proton-bound
Fig. 8g). At pH 4, a different fenestration is established by the β1 strand state and because its side-chain pKa value is close to the pH50 (pH of
and the pre-TM2 helix in the adjacent subunit (Extended Data Fig. 8i). half-maximal activation) value of PAC16 (Extended Data Fig. 9a).
The fenestration at pH 4 is wider than that at pH 8, and has several posi- At pH 8, His98 is in close contact with the Gln296, Iso298 and Ser102
tively charged residues lining the entry point, rendering it favourable residues of the adjacent ECD. We speculated that the side chain of His98
for anions (Extended Data Fig. 8i, j). To provide evidence that these fen- forms a hydrogen bond with the side-chain amine group of Gln296,
estrations could be extracellular ion-entry points in PAC, we performed which locks the TM1 helix in a conformation parallel to its cognate
molecular dynamics simulations and found that the fenestrations in TM2 helix (Fig. 4a, b). To investigate whether the interaction between

Article
His98 and Gln296 is critical for stabilizing the channel in a resting differ from that in an open state. Indeed, cysteine mutants of multiple
closed state, we engineered a disulfide bond connecting these two pore-lining residues in TM2 (for example, Ala316, Leu315, Gly312) can
residues. Indeed, double mutation of both His98 and Gln296 to cysteine still be accessed by the thiol-reactive reagent MTSES from the extracel-
(H98C/Q296C) fixed His98 in the resting position and thus rendered lular side5, indicating that the ion-conducting pore and lateral fenestra-
PAC insensitive to pH, whereas the control serine double mutant (H98S/ tions in an open state are probably substantially larger than in either
Q296S) showed increased pH sensitivity (Extended Data Fig. 9b–f). At of the present structures. Further studies are required to develop a
pH 4, the protonated His98 is decoupled from Gln296 and flipped into thorough understanding of this family of proton-sensitive ion channels.
the acidic pocket, which is 4.6 Å away (Fig. 4b). The acidic pocket is
formed by Glu107, Asp109 and Glu250, and interacts favourably with
the protonated His98 residue because Asp109 is predicted to remain Online content
unprotonated at pH 4 (Extended Data Fig. 9a). The flipping of His98 Any methods, additional references, Nature Research reporting sum-
pulls TM1 away from its cognate TM2 and creates a new interface with maries, source data, extended data, supplementary information,
the TM2 of the adjacent subunit (Fig. 4b). Concurrent with a 47.4° swing acknowledgements, peer review information; details of author con-
of TM1, the pore-lining TM2 undergoes a counterclockwise rotation of tributions and competing interests; and statements of data and code
22.8° when viewed from the intracellular side (Fig. 4b, c). availability are available at https://doi.org/10.1038/s41586-020-2875-7.
We hypothesized that the flipping of His98 from the resting position
to the acidic pocket is a critical element for the proton-induced acti- 1. Capurro, V. et al. Functional analysis of acid-activated Cl− channels: properties and
mechanisms of regulation. Biochim. Biophys. Acta 1848, 105–114 (2015).
vation of the PAC channel. To test this hypothesis, we first generated
2. Wang, H. Y., Shimizu, T., Numata, T. & Okada, Y. Role of acid-sensitive outwardly rectifying
mutants of His98 and its interacting partner in the pH8-PAC structure anion channels in acidosis-induced cell death in human epithelial cells. Pflugers Arch.
(Gln296) and examined their pH sensitivity (Fig. 4d, e). H98R, H98A and 454, 223–233 (2007).
3. Sato-Numata, K., Numata, T. & Okada, Y. Temperature sensitivity of acid-sensitive
Q296A mutants all resulted in an increased pH sensitivity by disengag-
outwardly rectifying (ASOR) anion channels in cortical neurons is involved in
ing the hydrogen bond between His98 and Gln296, which supports hypothermic neuroprotection against acidotoxic necrosis. Channels 8, 278–283 (2014).
the idea that the decoupling of protonated His98 from the resting 4. Yang, J. et al. PAC, an evolutionarily conserved membrane protein, is a proton-activated
chloride channel. Science 364, 395–399 (2019).
interface has a role in channel activation. However, the H98A and H98R
5. Ullrich, F. et al. Identification of TMEM206 proteins as pore of PAORAC/ASOR
mutants showed a similar pH50 value, which is unexpected because an acid-sensitive chloride channels. eLife 8, e49187 (2019).
alanine residue would be less attracted by the acidic pocket than an 6. Osei-Owusu, J., Yang, J., Del Carmen Vitery, M., Tian, M. & Qiu, Z. PAC proton-activated
chloride channel contributes to acid-induced cell death in primary rat cortical neurons.
arginine residue. This might be because an arginine residue at position
Channels 14, 53–58 (2020).
98 caused additional conformational changes rather than a simple 7. Jasti, J., Furukawa, H., Gonzales, E. B. & Gouaux, E. Structure of acid-sensing ion channel 1
side-chain substitution. Next we studied Glu107, which is close to His98 at 1.9 A resolution and low pH. Nature 449, 316–323 (2007).
8. Noreng, S., Bharadwaj, A., Posert, R., Yoshioka, C. & Baconguis, I. Structure of the human
in the pH4-PAC structure. Mutation of Glu107 to arginine (E107R) not epithelial sodium channel by cryo-electron microscopy. eLife 7, e39340 (2018).
only markedly increased the pH sensitivity but also decreased the Hill 9. Gonzales, E. B., Kawate, T. & Gouaux, E. Pore architecture and ion sites in acid-sensing ion
coefficient of the pH dose–response curve (Fig. 4d, e). As Glu107 has channels and P2X receptors. Nature 460, 599–604 (2009).
10. Baconguis, I., Bohlen, C. J., Goehring, A., Julius, D. & Gouaux, E. X-ray structure of
a predicted pKa value of around 6 in the pH4-PAC structure (Extended acid-sensing ion channel 1-snake toxin complex reveals open state of a Na+-selective
Data Fig. 9a), it probably also participates in PAC pH sensing. The E107R channel. Cell 156, 717–729 (2014).
mutation might cause a rearrangement of the acidic pocket, which 11. Yoder, N. & Gouaux, E. Divalent cation and chloride ion sites of chicken acid sensing ion
channel 1a elucidated by X-ray crystallography. PLoS ONE 13, e0202134 (2018).
could lead to an altered interaction with His98. Consequently, the 12. Yoder, N., Yoshioka, C. & Gouaux, E. Gating mechanisms of acid-sensing ion channels.
activation of the channel might require fewer protons, resulting in an Nature 555, 397–401 (2018).
increased pH sensitivity. 13. Mansoor, S. E. et al. X-ray structures define human P2X3 receptor gating cycle and
antagonist action. Nature 538, 66–71 (2016).
14. Vullo, S. et al. Conformational dynamics and role of the acidic pocket in ASIC
pH-dependent gating. Proc. Natl Acad. Sci. USA 114, 3768–3773 (2017).
Discussion 15. Gao, C. et al. Roles of the lateral fenestration residues of the P2X4 receptor that contribute
to the channel function and the deactivation effect of ivermectin. Purinergic Signal. 11,
Our work on PAC provides a glimpse of the molecular structures 229–238 (2015).
and pH-sensing mechanism of a proton-activated chloride chan- 16. Lambert, S. & Oberwinkler, J. Characterization of a proton-activated, outwardly rectifying
nel (Extended Data Fig. 9g). Similarly to ASIC12,14,17–19, the pH-sensing anion channel. J. Physiol. 567, 191–213 (2005).
17. Liechti, L. A. et al. A combined computational and functional approach identifies new
mechanism of PAC is almost certainly determined by multiple resi- residues involved in pH-dependent gating of ASIC1a. J. Biol. Chem. 285, 16315–16329 (2010).
dues, because several tested mutations altered the pH sensitivity but 18. Smith, E. S. J., Zhang, X., Cadiou, H. & McNaughton, P. A. Proton binding sites involved in
none of them abolished it, and because titratable residues are dis- the activation of acid-sensing ion channel ASIC2a. Neurosci. Lett. 426, 12–17 (2007).
19. Paukert, M., Chen, X., Polleichtner, G., Schindelin, H. & Gründer, S. Candidate amino acids
tributed throughout the ECD (Extended Data Fig. 9a). Our structural involved in H+ gating of acid-sensing ion channel 1a. J. Biol. Chem. 283, 572–581 (2008).
and functional data suggest that the pH4-PAC structure represents a
proton-bound pre-open state or a proton-bound desensitized state. We Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
acknowledge the limits of using a proton-bound non-conducting con-
formation to discuss the activation mechanism, because the TMD may © The Author(s), under exclusive licence to Springer Nature Limited 2020

Methods (SEC) using TBS as the running buffer. Peak fractions were combined
and concentrated to 4 mg ml–1 before grid freezing.
Data reporting
No statistical methods were used to predetermine sample size. The Cryo-grid preparation
experiments were not randomized and the investigators were not The purified PAC nanodisc protein is in TBS buffer at pH 8. The pH 4
blinded to allocation during experiments and outcome assessment. condition was made by adding 1 M acetic acid (pH 3.5) buffer to the
purified PAC nanodisc sample at a 1:20 v/v ratio. Fluorinated octyl
Constructs and mutagenesis maltoside (0.5 mM) was added to the sample to help reduce protein
For protein expression and purification, the human PAC gene (PACC1; unfolding in the air–water interface. Quantifoil grids (Au 1.2/1.3 or Au
Uniprot ID Q9H813) from our previous study4 was subcloned into a pEG 2/1, 300 mesh) were glow-discharged for 30 s. A cryo-grid was made
BacMam vector20 with a thrombin cutting site, enhanced green fluores- using the VitrobotMark III kept at 18 °C and 100% humidity. A volume
cent protein (eGFP) and 8×His tag in the C terminus. The pIRES2-eGFP of 2.5 μl of the PAC nanodisc protein sample was loaded to the grid,
vector containing PACC1 was used for whole-cell patch-clamp record- blotted for 1.5 s, plunge-frozen into liquid ethane and transferred into
ings4. Site-directed mutagenesis was performed using the QuikChange liquid nitrogen for storage.
site-directed mutagenesis protocol (Agilent) and confirmed by Sanger
sequencing. Cryo-EM data collection
Cryo-EM data were collected using a FEI Titan Krios transmission elec-
Mammalian cell culture, protein expression, purification and tron microscope equipped with a Gatan K2 Summit direct electron
nanodisc reconstitution detector. Automated data acquisition was facilitated by SerialEM
For small-scale protein expression, adherent tsA201 cells were main- software in super-resolution counting mode22. Each raw movie stack
tained in Dulbecco’s modified Eagle medium (DMEM) supplemented consists of 40 frames with a total dose of 49.6 e–/Å2 for 8 s. Nominal
with 10% fetal bovine serum (FBS) at 37 °C. When the cell density defocus values were set to range from −1.2 to −1.9 μm.
reached approximately 80% confluence, transient transfection was
performed by incubating the plasmid DNA and Lipofectamine-2000 Single-particle data analysis
reagent (Thermo Fisher Scientific) in Opti-MEM medium (Thermo For both the pH8-PAC and the pH4-PAC dataset, raw movies were first
Fisher Scientific) using manufacturer-provided protocols. Sodium motion-corrected using MotionCor v.1.2.1 (ref. 23). The contrast transfer
butyrate (10 mM) was added to the adherent cells 24 h after transfec- function (CTF) of each micrograph was estimated using Gctf v.1.06
tion. The cells were then maintained at 30 °C to boost protein expres- (ref. 24) or CTFFIND25. Template-based particle picking was conducted
sion. The next day, the adherent cells were washed with 20 mM Tris, using Gautomatch v.0.53 (https://www2.mrc-lmb.cam.ac.uk/down-
150 mM NaCl, pH 8.0 (TBS) buffer, collected and stored at −80 °C. load/gautomatch-053/). Junk particles were sorted by two rounds of
For large-scale protein expression, we used the Bac-to-Bac baculo- two-dimensional (2D) classification in RELION v.3.0 (ref. 26).
virus expression system21. Specifically, plasmid expressing wild-type For the pH8-PAC dataset, particles belonging to the 2D class aver-
PAC was transfected into DH10α cells to produce the bacmid. Purified ages with features were selected for ab initio three-dimensional (3D)
bacmid was transfected into adherent Sf9 cells using Cellfectin II rea- reconstruction in cryoSPARC v.0.6.5 (ref. 27). The resulting map was then
gent (Thermo Fisher Scientific) to produce P1 virus. P2 virus was then used as the template for 3D classification using RELION v.3.0 with C1
generated by infecting suspension Sf9 cells with the P1 virus at a 1:5,000 symmetry. Class averages with high-resolution features were combined
(v/v) ratio. The expression of PAC protein was induced by infecting and refined by imposing C3 symmetry. A solvent mask was generated
tsA201 suspension cells in FreeStyle 293 medium (Gibco) with 7.5% P2 and was used for all subsequent refinement steps. Bayesian polishing
virus. After 8–12 h, sodium butyrate (5 mM) was added to the infected was conducted to refine the beam-induced motion of the particle set,
suspension cells and the temperature was adjusted to 30 °C. Suspension resulting in a map at 4.0-Å resolution28.
tsA201 cells were collected 70 h after infection and stored at −80 °C. We noticed that the size of the nanodiscs was not homogeneous,
Mammalian cells infected with PAC were suspended in TBS buffer which could lead to inaccuracy of particle alignment. In addition, the
(150 mM NaCl, 20 mM Tris HCl, pH 8.0) supplemented with a protease cytosolic side of the TMD is flexible, which could also influence par-
inhibitor cocktail (1 mM phenylmethylsulfonyl fluoride, 2 mM pepsta- ticle alignment. To address these potential problems, we subtracted
tin, 0.8 μM aprotinin and 2 μg ml–1 leupeptin) and lysed by sonication. the nanodisc signal and further classified the particles without image
Cell debris was removed by centrifugation at 3,000g for 10 min. The alignment. A subsequent 3D refinement allowed us to obtain a map at
membrane fraction was pooled by ultracentrifugation of the superna- 3.60 Å for PAC. Likewise, we also attempted to only include signals from
tant for 1 h at 186,000g. The membrane was then Dounce-homogenized the ECD and part of the TMD close to the ECD. This strategy allowed us
and solubilized in TBS buffer with 1% glyco-diosgenin (GDN) and the pro- to obtain a reconstruction at 3.36 Å. The pH 8 data-processing workflow
tease inhibitor cocktail. After 1 h, the sample was ultracentrifuged for 1 h is summarized in Extended Data Fig. 2a.
at 186,000g. The supernatant was applied to 2 ml talon resin preequili- For the pH4-PAC dataset, the initial 3D classification was performed
brated with TBS buffer with 0.02% GDN. The resin was washed with using the pH8-PAC map (low-pass-filtered to 50 Å) as the reference
20 ml TBS buffer with 0.02% GDN and 20 mM imidazole. The protein without imposing symmetry. Particles belonging to 3D classes with
was eluted with 8 ml TBS buffer with 0.02% GDN and 250 mM imidazole. high-resolution features were pooled and refined using C3 symmetry.
The eluent was concentrated to 500 μl using a 100-kDa concentrator This yielded a reconstruction at 5.8 Å. We then reclassified the particles
(MilliporeSigma). MSP3D1 protein and soybean lipid (SBL) extract using this map (low-pass-filtered to 50 Å) and imposed C3 symmetry.
were mixed with the PAC protein sample using a molar ratio of 3:400:1 Subsequent refinement on reasonable 3D classes allowed us to obtain
(MSP3D1:SBL:PAC). Three rounds of biobead incubation were carried a reconstruction at 4.6-Å resolution. Finally, a third 3D classification
out to facilitate nanodisc reconstitution. The volume of the mixture was initiated by only low-pass-filtering the reference to 7 Å and with C3
was then expanded to 12.5 ml so that the imidazole concentration was symmetry. This classification helped obtain a homogeneous particle set
10 mM. Empty nanodiscs were removed by passing the mixture through that gave a reconstruction at 4.2 Å after refinement. We then performed
talon resin a few times. MSP3D1–PAC was eluted using TBS buffer con- signal subtraction and 3D classification without image alignment by
taining 250 mM imidazole and concentrated to 500 μl. Thrombin (0.01 focusing on the ECD and part of the TMD proximal to the ECD. This step
mg ml–1) was then added to the eluent to cleave eGFP at 4 °C overnight. allowed us to further push the overall map resolution to 3.73 Å for PAC
The mixture was further purified by size-exclusion chromatography at pH 4. We also attempted to only refine the ECD and part of the TMD
Article
for the pH 4 data, which resulted in a map at 3.66-Å resolution. The pH 4 issue, we generated an ensemble of PAC models based on the pH 8
data-processing workflow is summarized in Extended Data Fig. 3a. and pH 4 structures using Rosetta. Specifically, the fixed backbone
design protocol was used to sample the side-chain rotamers43. A total
Model building of 1,000 atomic models were built and subjected to pKa prediction
The pH8-PAC model was built de novo using Coot29. Registers were iden- using PROPKA37. The mean and standard deviation of the pKa for his-
tified by secondary structure prediction from the JPred web server and tidine, glutamate and aspartate residues are provided in Extended
bulky residues in the density30. Both the full map and the ECD-focused Data Fig. 9a.
map were used during model building. We were able to model residues
61–338 into the map. Extra density observed on Asn148, Asn155, Asn162 Electrophysiology
and Asn190 in the ECD was modelled as N-acetyl-d-glucosamine (NAG) PAC-knockout HEK293 cells were seeded on coverslips and transfected
to represent N-linked glycosylation. Real-space refinement was per- with wild-type or mutant PAC plasmids using Lipofectamine 2000
formed in PHENIX to produce the final model31. (Thermo Fisher Scientific). The cells were recorded around 1 day after
The pH4-PAC model was first generated by the RosettaEM flexible transfection. Whole-cell patch-clamp recordings were performed as
fitting tools, with the pH8-PAC map as the starting point32. The model described previously4. The extracellular recording solution contained
was then manually adjusted in Coot and subjected to PHENIX real-space (in mM): 145 NaCl, 2 KCl, 2 MgCl2, 1.5 CaCl2, 10 HEPES, 10 glucose (300–
refinement31. The final model contains residues 53–339 of PAC. Models 310 mOsm/kg; pH 7.3, titrated with NaOH). Acidic extracellular solu-
and maps are visualized using UCSF Chimera, UCSF ChimeraX and tions were made of the same ionic composition with 5 mM sodium
PyMOL33–35. citrate as the buffer instead of HEPES, and pH was adjusted using citric
acid. Solutions were applied locally using a gravity perfusion system
Deglycosylation assay with a small tip 100–200 μm away from the recording cell. The intracel-
Adherent tsA201 cells transiently transfected with wild-type PAC–eGFP lular recording solution contained (in mM): 135 CsCl, 1 MgCl2, 2 CaCl2,
were solubilized using TBS buffer with 1% GDN for 1 h at 4 °C. The sample 10 HEPES, 5 EGTA, 4 MgATP (280–290 mOsm/kg; pH 7.2, titrated with
was centrifuged at 20,000g for 30 min. Deglycosylation was facilitated CsOH). Pipette solution used to observe PAC current at 0 mV contained
by mixing the PNGase F with the supernatant and incubating at room (in mM): 50 NaCl, 100 sodium gluconate, 10 HEPES (280–290 mOsm/kg;
temperature overnight. For the control reaction, the same amount of pH 7.2, adjusted with NaOH). Patch pipettes (2–4 MΩ) were pulled with
water was added instead of PNGase F enzyme. The next day, the sample a Model P-1000 multi-step puller (Sutter Instruments).
was mixed with 2× SDS sample-loading buffer (Sigma) and resolved For selectivity experiments, the extracellular solution used contained
by SDS–PAGE electrophoresis. The gel was imaged in the ChemiDoc (in mM): 15 or 150 NaCl, 10 MES, 10 glucose (osmolality adjusted with
system by probing the far-red and GFP signal (mUV 680 and 488 nm). mannitol to 300–310 mOsm/kg; pH adjusted with methanesulfonic
acid to 5.0). The pipette solution contained (in mM) 150 NaCl, 10 HEPES
Molecular dynamics simulations (280–290 mOsm/kg; pH 7.2, adjusted with NaOH). Voltage ramp pulses
The structure of PAC in the pH 8 or pH 4 state was used as the starting were applied every 3 s from −100 to +100 mV at a speed of 1 mV/ms,
model. Missing side-chain atoms were fixed using the PDB2PQR util- and a holding potential of 0 mV. The recorded currents were used to
ity36. Titratable residues were assigned as the predominant protona- generate I–V curves for reversal potential determination. The per-
tion state based on the predicted pKa value from PROPKA3 at pH 8 or meability ratios were calculated from shifts in the reversal potential
pH 4 (ref. 37). The membrane orientation of the protein was calculated using the Goldman–Hodgkin–Katz equation44. For the measurement
using the OPM server38. Subsequent system preparation was conducted of pH sensitivity, currents were normalized to the maximal current at
in CHARMM-GUI39. POPC lipids were selected to construct the lipid pH 4.6. The normalized data were then fitted to a pH dose–response
bilayer. The rest of the protein was solvated and neutralized in 150 mM curve equation
NaCl. The resulting simulation box had dimensions of approximately
87 × 87 × 163 Å. top − bottom
Y = bottom +
All-atom molecular dynamics (MD) simulation was carried out using 1 + 10(X −pH50)×HillSlope
Gromacs v. 2019.2 (ref. 40). CHARMM36m force field was used to param-
eterize the MD system41. The steepest-descent algorithm was used to to estimate the pH50 and Hill’s slope (Hill coefficient). Recordings were
minimize the energy of the system so that the Fmax was below 1,000 kJ done at room temperature with a MultiClamp 700B amplifier and 1550B
mol–1 nm–1. The NVT ensemble was then started to keep the temperature digitizer (Molecular Devices). Current signals were filtered at 2 kHz and
of the system at 310 K. Subsequently, the NPT ensemble was enabled by digitized at 10 kHz. Series resistance was compensated for at least 80%.
maintaining the system pressure at 1 bar. Protein non-hydrogen atoms Clampfit v.10.6 and GraphPad Prism v.6 or 7 were used for data analyses.
and phosphorus groups of POPC were restrained during NVT and NPT
equilibration. Production simulation continued from the NPT equili- Reporting summary
brated system with the restraints disabled. A Nosé–Hoover thermostat Further information on research design is available in the Nature
and a Parrinello–Rahman barostat were used to maintain system tem- Research Reporting Summary linked to this paper.
perature and pressure, respectively. Hydrogen atoms were constrained
using the LINCS algorithms42. For efficient GPU acceleration, a Verlet
cut-off scheme (12 Å) was enabled to maintain the particle neighbour
Data availability
list. We performed 100-ns simulation for both PAC pH 8 and PAC pH 4 The cryo-EM density maps and coordinates of pH8-PAC and pH4-PAC
conditions using a time step of 2 fs. Analysis of the MD trajectory was have been deposited in the Electron Microscopy Data Bank (EMDB)
conducted using the utilities inside Gromacs. Specifically, the slice of under accession numbers EMD-22403 and EMD-22404 and in the RCSB
water molecules in each snapshot was extracted using the gmx select Protein Data Bank (PDB) under accession codes 7JNA and 7JNC.
command. The coordinates of oxygen atoms in the water molecules
20. Goehring, A. et al. Screening and large-scale expression of membrane proteins in
were then projected to the x/y plan for visualization. mammalian cells for structural studies. Nat. Protoc. 9, 2574–2585 (2014).
21. Haley, E. et al. Expression and purification of the human lipid-sensitive cation channel
pKa prediction TRPC3 for structural determination by single-particle cryo-electron microscopy. J. Vis.
Exp. 143, e58754 (2019).
We noticed that the pKa prediction was very sensitive to the side-chain 22. Mastronarde, D. N. Automated electron microscope tomography using robust prediction
orientations of the input structure model. To partially account for this of specimen movements. J. Struct. Biol. 152, 36–51 (2005).
23. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for 41. Huang, J. et al. CHARMM36m: an improved force field for folded and intrinsically
improved cryo-electron microscopy. Nat. Methods 14, 331–332 (2017). disordered proteins. Nat. Methods 14, 71–73 (2017).
24. Zhang, K. Gctf: real-time CTF determination and correction. J. Struct. Biol. 193, 1–12 42. Hess, B. P-LINCS: a parallel linear constraint solver for molecular simulation. J. Chem.
(2016). Theory Comput. 4, 116–122 (2008).
25. Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron 43. Leaver-Fay, A., Kuhlman, B. & Snoeyink, J. An adaptive dynamic programming algorithm
micrographs. J. Struct. Biol. 192, 216–221 (2015). for the side chain placement problem. Pac. Symp. Biocomput. 10, 16–27 (2005).
26. Scheres, S. H. W. RELION: implementation of a Bayesian approach to cryo-EM structure 44. Yang, H. et al. TMEM16F forms a Ca2+-activated cation channel required for lipid
determination. J. Struct. Biol. 180, 519–530 (2012). scrambling in platelets during blood coagulation. Cell 151, 111–122 (2012).
27. Punjani, A., Rubinstein, J. L., Fleet, D. J. & Brubaker, M. A. cryoSPARC: algorithms for rapid 45. Zhang, Y. & Skolnick, J. TM-align: a protein structure alignment algorithm based on the
unsupervised cryo-EM structure determination. Nat. Methods 14, 290–296 (2017). TM-score. Nucleic Acids Res. 33, 2302–2309 (2005).
28. Zivanov, J., Nakane, T. & Scheres, S. H. W. A Bayesian approach to beam-induced motion
correction in cryo-EM single-particle analysis. IUCrJ 6, 5–17 (2019).
29. Emsley, P. & Cowtan, K. Coot: model-building tools for molecular graphics. Acta Acknowledgements We thank G. Zhao and X. Meng for support with data collection at the
Crystallogr. D 60, 2126–2132 (2004). David Van Andel Advanced Cryo-Electron Microscopy Suite; the HPC team of VARI for
30. Drozdetskiy, A., Cole, C., Procter, J. & Barton, G. J. JPred4: a protein secondary structure computational support; and D. Nadziejka for technical editing. W.L. is supported by National
prediction server. Nucleic Acids Res. 43, W389–W394 (2015). Institutes of Health (NIH) grants R56HL144929 and R01NS112363; Z.Q. is supported by a
31. Afonine, P. V. et al. New tools for the analysis and validation of cryo-EM maps and atomic McKnight Scholar Award, a Klingenstein-Simon Scholar Award, a Sloan Research Fellowship in
models. Acta Crystallogr. D 74, 814–840 (2018). Neuroscience and NIH grants R35GM124824 and R01NS118014; Z.R. is supported by an
32. Wang, R. Y. R. et al. Automated structure refinement of macromolecular assemblies from American Heart Association (AHA) postdoctoral fellowship (grant 20POST35120556); J.O.-O. is
cryo-EM maps using Rosetta. eLife 5, e17219 (2016). supported by an AHA predoctoral fellowship (grant 18PRE34060025); and J.D. is supported by
33. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and a McKnight Scholar Award, a Klingenstein-Simon Scholar Award, a Sloan Research Fellowship
analysis. J. Comput. Chem. 25, 1605–1612 (2004). in Neuroscience, a Pew Scholar in the Biomedical Sciences and NIH grant R01NS111031.
34. Goddard, T. D. et al. UCSF ChimeraX: meeting modern challenges in visualization and
analysis. Protein Sci. 27, 14–25 (2018). Author contributions W.L. and Z.Q. supervised the project. Z.R. purified PAC, prepared and
35. The PyMOL Molecular Graphics System, v.2.1. (Schrödinger, LLC, 2020). screened cryo-EM samples, performed cryo-EM data collection and processing and
36. Dolinsky, T. J., Nielsen, J. E., McCammon, J. A. & Baker, N. A. PDB2PQR: an automated performed computational simulation. J.O.-O. cloned the PAC constructs and performed
pipeline for the setup of Poisson–Boltzmann electrostatics calculations. Nucleic Acids electrophysiological studies. Z.R., J.O.-O., J.D., Z.Q. and W.L. contributed to data analysis and
Res. 32, W665–W667 (2004). manuscript preparation.
37. Olsson, M. H. M. Søndergaard, C. R., Rostkowski, M. & Jensen, J. H. PROPKA3: consistent
treatment of internal and surface residues in empirical pKa predictions. J. Chem. Theory Competing interests The authors declare no competing interests.
Comput. 7, 525–537 (2011).
38. Lomize, M. A., Pogozheva, I. D., Joo, H., Mosberg, H. I. & Lomize, A. L. OPM database and Additional information
PPM web server: resources for positioning of proteins in membranes. Nucleic Acids Res. Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
40, D370–D376 (2012). 2875-7.
39. Jo, S., Kim, T., Iyer, V. G. & Im, W. CHARMM-GUI: a web-based graphical user interface for Correspondence and requests for materials should be addressed to Z.Q. or W.L.
CHARMM. J. Comput. Chem. 29, 1859–1865 (2008). Peer review information Nature thanks Lily Jan, Stephan Kellenberger and Kenton Swartz for
40. Abraham, M. J. et al. Gromacs: high performance molecular simulations through their contribution to the peer review of this work.
multi-level parallelism from laptops to supercomputers. SoftwareX 1–2, 19–25 (2015). Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Purification of PAC and biochemical and biophysical cation-π interaction with Arg237′ and hydrophobic interactions with Tyr267′
analysis. a, Fluorescence size-exclusion chromatography (FSEC) of PAC–GFP and Phe282′ from the adjacent subunit. The two subunits are in green and blue.
solubilized in GDN detergent. b, SDS–PAGE gel of purified PAC–GFP protein f, FSEC traces of GFP-tagged wild-type PAC and the F196 mutant solubilized
after metal affinity chromatography. The uncropped source gel of the image using GDN detergent. The peak position of F196A is shifted and is broader
can be found in Supplementary Fig. 1a. The gel was repeated three times from compared to the wild type, suggesting that F196A interferes with the proper
different batches of purification and similar results were obtained. c, SEC assembly of PAC. g, The whole-cell current density of wild-type PAC and
profile of PAC in MSP3D1 nanodiscs. d, A deglycosylation assay of PAC–GFP PAC(F196A) recorded at pH 4.6 with a holding potential of 100 mV. The centre
with or without PNGase F treatment. The GFP and far-red signal (Alexa 488 and error bar represents mean and s.e.m. Two-tailed unpaired t-test was used to
Alexa 680) of the gel was detected and merged using ChemiDoc imaging determine the difference in current density between F196A and the wild type
system (BioRad). The uncropped source gel of the image can be found in (P = 3.09 ×10 −6). D’Agostino & Pearson omnibus test was performed to check the
Supplementary Fig. 1b. The deglycosylation assay was repeated twice with normality of the data (P values are 0.846 and 0.349 for wild type (n = 10) and
similar results. e, F196 mediates intersubunit interactions by forming a F196A (n = 11), respectively). ***denotes P < 0.001.
Extended Data Fig. 2 | Workflow for cryo-EM data-processing of pH8-PAC
and data statistics. a, A total of 16,733 raw movies stacks were collected and
processed with motion correction, CTF estimation and particle picking.
Particles were subjected to two rounds of 2D classification and a 3D
classification run to obtain a homogeneous particle set. To further sort out
conformational heterogeneity, we attempted to subtract and classify (1)
particles without nanodiscs and (2) the ECD of PAC (residues 72–317) by using a
mask. Subsequent refinement allowed us to obtain a map at 3.60-Å resolution
for the entire PAC protein and 3.36-Å resolution for the ECD. b, Representative
micrograph, 2D class averages, Fourier shell correlation (FSC) curves and
angular distribution of particles used for 3D reconstruction for the pH8-PAC
dataset. The gold-standard 0.143 threshold was used to determine map
resolution based on the FSC curve. The threshold for model versus map
correlation was 0.5 to determine the resolution.
Article

Extended Data Fig. 3 | Workflow for cryo-EM data-processing of pH4-PAC was launched by using the 4.6-Å pH4-PAC map as the reference and the low-pass
and data statistics. a, A total of 26,689 raw movie stacks were collected and filter set to 7 Å. The C3 symmetry was also imposed. This classification pushed
processed with motion correction, CTF estimation and particle picking. Two the resolution of the pH4-PAC map to 4.2 Å. In an effort to obtain a more
rounds of 2D classification were performed to clean up junk particles. homogeneous particle set, we subtracted the ECD of the pH4-PAC map
Subsequently, particles belonging to the 2D class averages with clear features (residues 72–317) and classified the refined particles without image alignment.
were subjected to three rounds of 3D classification. The initial 3D classification In the end, we obtained a reconstruction of the pH4-PAC map at 3.73-Å
was conducted by using the pH8-PAC map low-pass filter set to 50 Å as the resolution and a pH4-PAC ECD map at 3.66-Å resolution. b, Representative
reference. No symmetry operator was imposed in this step. After refinement micrograph, 2D class averages, Fourier shell correlation (FSC) curves and
with C3 symmetry, a 5.8-Å-resolution map for pH4-PAC was obtained. angular distribution of particles used for 3D reconstruction for the pH4-PAC
Subsequently, the second 3D classification job was conducted by using the 5.8- dataset. The gold-standard 0.143 threshold was used to determine map
Å map as the reference and the low-pass filter set to 50 Å. We imposed C3 resolution based on the FSC curve. The threshold for model versus map
symmetry at this step to increase the classification efficiency. This allowed us correlation was 0.5 to determine the resolution.
to obtain a map at 4.6 Å after refinement. Finally, a third 3D classification job
Article
Extended Data Fig. 4 | Local-resolution cryo-EM maps, representative shown. The unit for the colour key is Å. d, Representative densities of several
densities of cryo-EM maps and domain organization of human PAC. a, The secondary structural elements of pH4-PAC. The atomic model is overlaid with
local resolution of the pH8-PAC map. A non-sliced (left) and a sliced (right) view the density to show the side chain information. e, The pH8-PAC single subunit
of the map viewed parallel to the membrane are shown. The unit for the colour viewed parallel to the membrane. The wrist, palm, thumb, finger and β-ball
key is Å. b, Representative densities of several secondary structural elements domains are highlighted. f, The pH4-PAC single subunit viewed in the same
of pH8-PAC. The atomic model is overlaid with the density to show the side orientation as the right image of panel e. g, Domain organization of PAC.
chain information. c, The local resolution of the pH4-PAC map. A non-sliced Clusters of secondary structure that form the palm, finger, thumb and β-ball
(left) and a sliced (right) view of the map viewed parallel to the membrane are domains are labelled.
Extended Data Fig. 5 | Comparison of the structures of PAC and ASIC. chicken ASIC1a (green) subunit. The ECD of ASICa is composed of a β-sheet
a–d, Structural comparison of human PAC (a, c) with chicken ASIC1a (b, d) core and the exterior helical structure. Although the β-sheet core shares high
viewed parallel to the membrane (a, b) and from the extracellular side (c, d). similarity with the human PAC structure, the chicken ASIC1a TMD is organized
The acidic pocket of human PAC and chicken ASIC1a are in different locations. differently from that of the human PAC.
e, Overlay of the pH8-PAC (blue) and pH4-PAC (red) single subunit with the
Article
Extended Data Fig. 6 | Sequence alignment of PAC homologues and ASIC. extracellular domain of PAC are marked with yellow dots. Putative N-linked
Sequence alignment of PAC homologues (from human, frog (XENLA) and glycosylation sites of PAC are highlighted with green dots. Lys319 of PAC is
zebrafish (DANRE)) and chicken ASIC1. The ASIC1 sequence is aligned with PAC marked with red dots. The pre-TM2 helix observed in the pH4-PAC structure is
based on the structural alignment using TMalign45. Secondary structural (SS) indicated with a red frame. PAC lacks the α1, α2, α3, α4 and α5 helices that form
elements of PAC are labelled at the top, whereas the SS elements of ASIC1 are the ECD exterior helical structure in chicken ASIC1a, whereas the αA and αB
indicated at the bottom. Cysteine residues mediating disulfide bonds in the helices are unique to PAC.
Extended Data Fig. 7 | PAC channel desensitization. a, A representative and 0.077 for pH 4.6 and pH 4.0, respectively). D’Agostino & Pearson omnibus
whole-cell current trace of PAC in wild-type HEK293 cells upon extracellular test was performed to check the normality of the data (P values are 0.673 and
acidification at pH 4.6 and pH 4.0 with a holding potential at 100 mV. 0.335 for pH 4.6 and pH 4.0 conditions, respectively). NS indicates P > 0.05.
Substantial desensitization was observed during the prolonged exposure to e, Whole-cell patch-clamp recording configuration with 50 mM NaCl pipette
the pH 4.0 solution (position 4 versus position 3), but not to the pH 4.6 solution solution and 150 mM bath solutions (scheme depicted on the left). This creates
(position 2 versus position 1). b, Quantification of PAC desensitization (pH 4.6 the concentration gradient necessary to observe any potential PAC current at 0
(n = 12) and pH 4.0 (n = 11) as shown in a. Activation and desensitization currents mV. Owing to the small amplitude of endogenous PAC current at 0 mV, we
are normalized to the initial PAC currents. The x axis numbers correspond to transfected PAC cDNA in PAC knockout HEK293 cells. The representative
the red marker location in a. Each data point is represented by a solid dot. The whole-cell current trace of PAC upon acidification at 0 mV is shown on the right.
mean and s.e.m. are represented by the bar graph. c, Representative whole-cell Location 1 and 3 represent initial activation of PAC immediately after acidic
current-voltage traces of PAC at the beginning (position 3 in a) and the end buffer treatment. Location 2 and 4 represent desensitized PAC after prolonged
(position 4 in a) of pH 4.0 treatment. d, Reversal potential of PAC at the acidic buffer treatment. f, The desensitized currents (position 2 and 4 in e) are
beginning and the end of pH 4.6 and pH 4.0 treatment, respectively (n = 9). normalized to the initial PAC currents (position 1 and 3 in e). The desensitized
Two-tailed paired t-test was used to determine significance (P values are 0.361 data currents are represented by the normalized average ± s.e.m.
Article
Extended Data Fig. 8 | Lateral fenestration and ion selectivity of PAC. a, The wild-type PAC, PAC(K325E) and PAC(K329E). The currents are normalized to
reversal potential (Vrev) of wild-type PAC, PAC(K325E) and PAC(K329E) at those at pH 4.6 (n = 8 (wild-type PAC), n = 6 PAC(K325E) and n = 7 (PAC(K329E)).
150 mM NaCl (black) or 15 mM NaCl (red) in the bath solution (internal solution The currents at different pH are represented by the average normalized
contains 150 mM NaCl). The bar graph represents the mean and s.e.m. (n = 16 currents ± s.e.m. A nonlinear fitting to a sigmoidal dose–response curve is
(wild type), n = 8 (K325E) and n = 6 (K329E)). Individual data points are shown as generated for each construct. e, Representative whole-cell patch-clamp
dots. The same data points for the wild type were also used in Fig. 3i for recording at pH 5.0 with 150 mM NaCl pipette solution and 150 mM (black) or
comparison with K319E. b, The relative Cl−/Na+ permeability for wild-type PAC 15 mM NaCl (red) bath solutions. The current–voltage relationship of wild-type
(n = 16), and K325E (n = 8) and K329E (n = 6) mutants calculated from the pH- (left), K325E (middle) and K329E (right) PAC in two different bath solutions are
5-induced current at 100 mV. The centre and error bar represent the mean and plotted. The same wild-type traces were also shown in Fig. 3j (left) for
s.e.m of the permeability ratio. Individual data points are shown as solid dots. comparison with K319E. f, i, The pH8-PAC and pH4-PAC extracellular
The same data points for the wild type were also used in Fig. 3j for comparison fenestration viewed from the extracellular side (left) and parallel to the
with K319E. The average PCl/PNa permeability values are indicated for each membrane (right), respectively. Residues forming the fenestration are shown
construct. c, The current density of wild-type PAC (n = 10), and K325E (n = 10) in sticks, including three negatively charged residues (Asp91, Glu94 and
and K329E mutants (n = 10) at pH 4.6 with a holding potential of 100 mV. The bar Glu250) for pH8-PAC and two positively charged residues (Arg93 and Lys294)
graph shows the average normalized current density ± s.e.m. One-way ANOVA for pH4-PAC. g, j, Radius of the fenestration tunnel, estimated by CAVER v.3.0,
with Bonferroni post-hoc test was used to determine the significance (P values for pH8-PAC (g) and pH4-PAC ( j). The horizontal line marks the smallest radius
are 0.832 and 0.416 for K325E and K329E, respectively). D’Agostino & Pearson along the tunnel. The residues lining the fenestration tunnel are marked.
omnibus test was performed to check the normality of the data (P values are h, k, Fenestration water-density plot for pH8-PAC (h) and pH4-PAC (k) from a
0.255, 0.153 and 0.293 for the wild type and K325E and K329E mutants, 100-ns MD simulation. Water molecules in the Z range of the side fenestration
respectively). NS indicates P > 0.05. d, The pH dose–response curve of site are projected to the X/Y plane and are shown as a 2D histogram.
Article
Extended Data Fig. 9 | His98 is involved in PAC pH sensing. a, pKa prediction 0.727 for the wild type and the H98C/Q296C and H98S/Q296S mutants,
of titratable residues for the pH8 and pH4 structures of human PAC. The mean respectively). e, The pH dose–response curve of wild-type PAC and PAC(H98S/
and error bar (standard deviation) are calculated based on 1,000 fixed- Q296S). The currents are normalized to those at pH 4.6 (n = 5 (wild-type PAC);
backbone rotamer ensembles generated from each structure (see Methods). b, n = 6 (PAC(H98S/Q296S)). A nonlinear fitting to a sigmoidal dose–response
SDS gel of GFP-tagged wild-type PAC, PAC(H98C/Q296C) and PAC(H98S/ curve is generated for each construct. Bar plot shows the mean ± s.e.m. f, The
Q296S). A dimeric band is observed for the H98C/Q296C mutant, but not for pH50 of wild-type PAC and PAC(H98S/Q296S) estimated from the pH dose–
the wild type and the H98S/Q296S mutant. The unedited source gel of the response curve. The centre and bar represent the estimated pH50 and s.e.m.
image can be found in Supplementary Fig. 1c. The gel was independently from the nonlinear fitting in e. Two-tailed Mann–Whitney test was used to
repeated twice with similar results. c, The FSEC profile of GFP-tagged wild-type determine the significance (P = 0.0087). g, The proposed pH-sensing
PAC, PAC(H98C/Q296C) and PAC(H98S/Q296S) solubilized using GDN mechanism for PAC. At high pH, the deprotonated His98 residue is surrounded
detergent. d, The whole-cell current density of wild-type PAC, PAC(H98C/ by Gln296, Ser102 and Iso298, and TM1 pairs with TM2 from the same subunit.
Q296C) and PAC(H98S/Q296S) recorded at pH 5.0 at 100 mV. The bar graph At low pH, the protonated His98 residue undergoes a conformational change
shows the average current density (nA/pF) ± s.e.m. Each individual data point and moves into an acidic pocket. As a result, TM1 dissociates from the resting
represents a cell (n = 8 (wild type), n = 10 (H98C/Q296C) and n = 12 (H98S/ interface and rotates to interact with TM2 of the adjacent subunit. For all
Q296S)). Two-tailed unpaired t-test was used to determine the difference in panels, NS indicates P > 0.05, ** denotes a P value between 0.01 and 0.001 and
current density compared to the wild type (P values are 1.08 × 10 −6 for H98C/ *** denotes P < 0.001; n represents measurements from biologically
Q296C and 0.321 for H98S/Q296S). D’Agostino & Pearson omnibus test was independent cells.
performed to check the normality of the data (P values are 0.328, 0.154 and
Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics
Corresponding author(s): Wei Lü
Reporting Summary
Statistics
n/a Confirmed

Software and code

Data collection SerialEM 3.7, ClampFit 10.6
Data analysis Gctf-1.06, ctffind-4.1.10, Gautomatch-0.56, Relion-3.0, CryoSparc-v0.6.5, coot-0.8.9.2, pymol-2.3.2, Motioncor2-1.2.1,
phenix.real_space_refine_dev_3500, phenix.molprobity_dev_3500, UCSF chimera_1.13.1, UCSF chimeraX_0.91, GraphPad Prism 6 and 7,
ClampFit_10.6, GROMACS version 2019.2, OPM server (https://opm.phar.umich.edu/ppm_server), CHARMM-GUI (http://www.charmm-
gui.org/), Rosetta 2020.08.61146, propka3.1, TMalign v20190822
Data
October 2018
The cryo-EM density map and coordinates of pH8-hsPAC and pH4-hsPAC have been deposited in the Electron Microscopy Data Bank (EMDB) under accession
numbers EMD-22403 and EMD-22404 and in the Research Collaboratory for Structural Bioinformatics Protein Data Bank under accession codes 7JNA and 7JNC.
1

Sample size Sample size was not pre-determined for the study. All the electrophysiology experiments were repeated at least five times using different
cells. The sample size was determined based on the consistence/variability of the recordings. Overall, the samples sizes were deemed
sufficient based on the clearly visible effects of the mutations on the overall distribution of data points within and between each group (The
sample standard deviation is usually much smaller than the effect we are aiming to test). However, we acknowledge that our current sample
size may not be sufficient to detect small effect that may be present in some of the comparisons, leading us to accept the null hypothesis.
Data exclusions No data was excluded from the analysis.
Replication We have done each group of experiment with several batches of cells and different transfection, to ensure reproducibility within the lab. The
number biologically independent experimental replications is indicated in the figure legend. For electrophysiology recordings, the number is
at least 5. For SDS PAGE gel experiments, they were repeated at least twice.
Randomization Our experiment is not randomized. For electrophysiology experiments, cells with GFP fluorescence (proteins were GFP-tagged) were
randomly selected. Other experiments including protein expression, solubilization test, protein purification, deglycosylation assay, and cryo-
EM grids preparation and data collection were repeated multiple time; each time, proteins from different random batches were used.
Blinding The investigators were not blinded; it was not technically or practically feasible to do so for cryo-EM or patch-clamp studies.


Antibodies ChIP-seq
Clinical data

Cell line source(s) Sf9 cells, tsA201 cells and HEK293 cells were purchased from ATCC
Authentication The cells were purchased and routinely maintained in our lab. They were not authenticated experimentally for these studies.
Mycoplasma contamination Sf9 cells, tsA201 cells and HEK293 cells were tested negative form Mycoplasma contamination
Commonly misidentified lines No commonly misidentified lines were used

October 2018
2
Matters arising
Crop asynchrony stabilizes food production

https://doi.org/10.1038/s41586-020-2965-6 Lukas Egli1,2 ✉, Matthias Schröter1, Christoph Scherber3,4, Teja Tscharntke5,6 & Ralf Seppelt1,7
Received: 12 February 2020

arising from D. Renard & D. Tilman Nature https://doi.org/10.1038/s41586-019-1316-y (2019)
Check for updates
Stable agricultural systems are fundamental for the reliability of agri- Tilman1 using production stability as the response variable (see Sup-
cultural production and food security. Recently, Renard and Tilman1 plementary Methods for details). We then ran two additional regression
reported that crop diversity, calculated as the exponential value of models, one in which we replaced crop diversity with crop asynchrony
the Shannon diversity index of harvested areas of 176 crops, stabilizes and one in which we added crop asynchrony.
national food production. Here we show that crop asynchrony—that is, Crop diversity and crop asynchrony were found to be correlated
asynchronous production trends between different crops2—is an even (Spearman’s ρ = 0.49, P < 0.05; Fig. 1a). However, the positive effect of
better predictor of agricultural production stability than is crop diver- crop diversity on asynchrony decreased over time (Fig. 1a), as indicated
sity. Our finding suggests that asynchrony is one important property by a better performance of a linear mixed-effects model including time
that can explain why a higher crop diversity supports the stability of interval (Akaike information criterion (AIC) = −340.22) compared to a
national food production, and that it should be considered in strate- linear model including crop diversity only (AIC = −336.88). The positive
gies to stabilize agricultural production through crop diversification. effect of crop asynchrony on caloric production stability was more
We suggest that, as well as yield stability and crop diversity, two addi- than three times the effect of crop diversity (Fig. 1b, Extended Data
tional aspects should be considered in the discussion of the diversity– Table 2). Other predictors showed similar trends, although the effect
stability nexus. First, as well as yield stability, the stability of overall of nitrogen use intensity, time and temperature instability was stronger
production is another relevant aspect of food security. Second, the in the diversity model, whereas the effect of irrigation was lower and
actual benefits of crop diversity are not related to harvested areas as insignificant. Moreover, the explanatory power of the model increased
such, but to the temporal production patterns of the cultivated crops2. from R2 = 0.28 in the crop-diversity model to R2 = 0.60 in the asynchrony
We suggest that planting multiple crops stabilizes agricultural pro- model (Extended Data Table 2). In the model that includes both predic-
duction only if they experience asynchronous production trends—for tors, the stabilizing effect of crop asynchrony was even stronger and
example, due to distinct responses of the individual crops to climatic, the effect of crop diversity was negative (Fig. 1b, Extended Data Fig. 1);
economic and political shocks3. Here we use statistical models to test however, explanatory power increased by only 0.01 (Extended Data
whether crop asynchrony is a better predictor of agricultural produc- Table 2). Although crop diversity and asynchrony were correlated,
tion stability than is crop diversity. multicollinearity was not an issue in the combined model (the variance
We largely used the same datasets as Renard and Tilman1 (Extended inflation factors were less than 2). Given that crop asynchrony was a
Data Table 1) and derived the same explanatory variables used in their strong predictor of caloric production stability, we further explored
analysis, including effective crop species diversity4, irrigation4, nitrogen their relationship in the most recent time interval (2001–2010). The
use intensity4, warfare5, temperature and precipitation instability6–8 for highest national crop asynchronies were mainly observed in South
five ten-year intervals between 1961 and 2010 (see Supplementary Meth- and Southeast Asia, China, Central America and parts of Africa (Fig. 2).
ods for details) to predict the stability of total caloric production4,9,10. Countries within these regions typically showed high production stabil-
We additionally calculated synchrony between crop-specific caloric ity, and all countries with high asynchrony achieved at least medium
production2,11,12, an index bounded between 0 and 1, where 1 indicates stability. Countries with high production stability and low-to-medium
full synchrony. Asynchrony was then calculated by subtracting syn- asynchrony were mainly found in North and South America (Fig. 2). The
chrony from 1, so that higher values indicate higher asynchrony. We 29 countries that had low asynchrony and stability—including Russia,
used total production instead of yield stability as the response variable, Argentina and Australia—contributed more than 11% of the total crop
because this offers additional insights into food security and because caloric production.
it can be directly related to asynchrony (see Supplementary Methods Our analysis provides an important extension to the results presented
for details). Moreover, total production incorporates the effects of by Renard and Tilman1. We found that the relationship between crop
changes in cropland area as a result of planning decisions by farmers diversity and crop asynchrony decreased over time, which is a potential
and of changes in global market dynamics. consequence of the increasing homogeneity of global food supplies13.
First, we investigated the relationship between effective crop species Most importantly, we identified asynchrony as one important crop
diversity and crop asynchrony and tested if this relationship changed property (or trait) that can explain why a higher crop diversity sup-
over time, as crop homogenization has occurred during recent dec- ports the stability of national food production. Crop diversity as such
ades13. To predict crop asynchrony, we used a linear mixed-effects provides only limited insights into the mechanism that underlies stabil-
model with random slopes for diversity and random intercepts for ity. The benefits of crop diversity depend on the production patterns
time intervals14. Second, we investigated how either crop diversity, of the cultivated crops. Therefore, strategies to stabilize agricultural
crop asynchrony or both affect caloric production stability. For this, production through crop diversification also need to account for the
we constructed the main linear regression model used in Renard and asynchrony of the crops considered.
UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany. 2University of Potsdam, Institute of Biochemistry and Biology, Potsdam, Germany. 3University of Münster, Institute of
1
Landscape Ecology, Münster, Germany. 4Centre for Biodiversity Monitoring, Zoological Research Museum Alexander Koenig, Bonn, Germany. 5University of Göttingen, Agroecology,
Department of Crop Sciences, Göttingen, Germany. 6University of Göttingen, Centre of Biodiversity and Sustainable Land Use (CBL), Göttingen, Germany. 7Institute of Geoscience and
Geography, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany. ✉e-mail: lukas.egli@ufz.de
Nature | Vol 588 | 10 December 2020 | E7

Matters arising
a b
1.00
0.6
***
0.5 ***
Model
0.4 Diversity
Asynchrony
Combined
0.75 0.3
Standardized regression coefficient

0.2 *** ***
***
0.1 NS** * *
Asynchrony 0.0
0.50 NS
NS NSNS NS
–0.1
*** *** ***
–0.2
*** ***
*** ***
–0.3
Time interval ***
0.25 1961–1970 –0.4
1971–1980
1981–1990
1991–2000 –0.5
2001–2010
–0.6
5 10 15 20 25
e
ty
re
y
y
n)
y
ity
m
on
ilit
ilit
si
fa
io
Diversity
Ti
ns
r
ab
hr
ab
ar
at
ve
te
c
W
ig
st
st
Di
yn
in
Irr
in
in
As
√(
n
re
us
tio
tu
N
ita
ra
√(
pe
ip
ec
m
Pr
Te
Fig. 1 | Crop asynchrony as a function of crop diversity and determinants of asynchrony (blue) and both (orange) (n = 590). Caloric production stability was
national caloric production stability. a, Crop asynchrony as a function of log-transformed, irrigation and nitrogen use intensity were square-root-transformed.
crop diversity using a linear mixed-effects model with random slopes for Each predictor variable was standardized to 0 mean and 1 s.d. across all nations
diversity, and random intercepts for time intervals. Dots show national data and time intervals. Data are mean ± s.e.m. *P < 0.05; **P < 0.01; ***P < 0.001; NS,
coloured by time interval (n = 590). b, Regression coefficients for all variables not significant. This figure was created with the statistical software package
in the linear regression models, including crop diversity (green), crop R 3.6.110.
The results from the crop-diversity model are largely similar to the temporal variance2,15 at farm level, at which management decisions
findings of Renard and Tilman1, because the different response vari- are made. Moreover, growing crops in different seasons is additionally
ables (caloric yield versus production stability) were highly correlated expected to increase asynchrony, which should be further investigated.
(Spearman’s ρ = 0.84, P < 0.05). However, the effect of irrigation was Likewise, we need to better understand the conditions under which
less stabilizing for production compared to yield stability, and the asynchrony is needed for and beneficial to stability. Spain, for exam-
opposite was true for nitrogen use intensity. Moreover, overall pro- ple, experienced medium asynchrony but low stability in 2001–2010,
duction stability significantly decreased over time, which has serious whereas the opposite was true for Germany. In countries that have low
implications for food security. crop asynchrony and stability, planting additional crops with different
Asynchrony emerges from the distinct responses of individual crops responses to climatic and market disturbances might be a viable option
to climatic, economic and political shocks3. Although there is increasing to increase stability and therefore food security15, in particular in light
knowledge about the underlying drivers of overall production losses3, of climate change and increasing perturbations in global markets.
little is known about the effects on individual crops in various environ- On the national level, this is especially relevant for countries that are
mental and socioeconomic contexts—in particular regarding their facing severe food insecurity, such as Malawi. For countries such as
High
Stability
Low
Low High
Asynchrony
Fig. 2 | National crop asynchrony and caloric production stability excluded from the analysis are shown in white. The figure was created with the
worldwide. Crop asynchrony and caloric production stability are shown for statistical software package R 3.6.110.
the 2001–2010 interval and are grouped by tertiles (n = 136). Countries
E8 | Nature | Vol 588 | 10 December 2020

Russia and Argentina, which recently experienced low asynchrony 6. Klein Goldewijk, K., Beusen, A., Doelman, J. & Stehfest, E. New anthropogenic land use
estimates for the Holocene – HYDE 3.2. Earth Syst. Sci. Data 9, 927–953 (2017).
and stability but contributed more than 5% of the global crop calories, 7. Sacks, W. J., Deryng, D. & Foley, J. A. Crop planting dates: an analysis of global patterns.
increasing crop asynchrony is an important aspect to consider from a Glob. Ecol. Biogeogr. 19, 607–620 (2010).
global perspective. Growing trade and changing diets might further 8. Willmott, C. J. & Matsuura, K. Terrestrial Air Temperature and Precipitation: Monthly and
Annual Time Series (1950–1999). http://climate.geog.udel.edu/~climate/html_pages/
lead to crop homogenization2,13 and therefore pose risks to the stability README.ghcn_ts2.html (2001).
of national and global food production. 9. Food balance sheets: A handbook (FAO, 2001).
10. R Core Team. R: A Language and Environment for Statistical Computing (R Foundation for
Statistical Computing, 2019).
11. Loreau, M. & de Mazancourt, C. Species synchrony and its drivers: neutral and nonneutral
Reporting summary community dynamics in fluctuating environments. Am. Nat. 172, E48–E66 (2008).
Further information on research design is available in the Nature 12. Hallett, L. M. et al. codyn : An R package of community dynamics metrics. Methods Ecol.
Evol. 7, 1146–1151 (2016).
Research Reporting Summary linked to this paper. 13. Khoury, C. K. et al. Increasing homogeneity in global food supplies and the implications
for food security. Proc. Natl Acad. Sci. USA 111, 4001–4006 (2014).
14. Bates, D., Mächler, M., Bolker, B. M. & Walker, S. C. Fitting linear mixed-effects models
using lme4. J. Stat. Softw. 67, 1–48 (2015).
Data availability
15. Knapp, S. & van der Heijden, M. G. A. A global meta-analysis of yield stability in organic
All datasets used and generated during this study are provided in a and conservation agriculture. Nat. Commun. 9, 3632 (2018).
public repository: https://github.com/legli/AgriculturalStability.
Acknowledgements L.E. acknowledges funding from the Helmholtz Association (Research
School ESCALATE, VH-KO-613). We thank V. Grimm for discussions; M. Wu for statistical
support and D. Renard for discussions and the exchange of code to make our analysis clearer
Code availability and more consistent. The FAOSTAT database is maintained and regularly updated by FAO with
The codes used for data preparation and analyses are provided in a regular support from its Member States.
public repository: https://github.com/legli/AgriculturalStability. Author contributions L.E., M.S., T.T. and R.S. designed the study. L.E. and C.S. performed the
analysis. All authors wrote the manuscript.
1. Renard, D. & Tilman, D. National food production stabilized by crop diversity. Nature 571, Competing interests The authors declare no competing interests.
257–260 (2019).
2. Mehrabi, Z. & Ramankutty, N. Synchronized failure of global crop production. Nat. Ecol. Additional information
Evol. 3, 780–786 (2019). Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
3. Cottrell, R. S. et al. Food production shocks across land and sea. Nat. Sustain. 2, 130–137 2965-6.
(2019). Correspondence and requests for materials should be addressed to L.E.
4. The Food and Agriculture Organization of the United Nations Statistics (FAO, accessed Reprints and permissions information is available at http://www.nature.com/reprints.
22 November 2019); https://www.fao.org/faostat/en/#data/. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
5. Marshall, M. G. Codebook: Major Episodes of Political Violence (MEPV) and Conflict published maps and institutional affiliations.
Regions, 1946–2015. http://www.systemicpeace.org/inscr/MEPVcodebook2016.pdf
(2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020

Matters arising
Extended Data Fig. 1 | Main determinants of national caloric production back-transformed from square-root-transformation, predicted values were
stability. a–h, Effects of crop diversity (a), crop asynchrony (b), irrigation (c), back-transformed from log-transformation. Predictions were calculated using
nitrogen use intensity (d), temperature instability (e), precipitation instability (f), the observed range of the focal predictor, while keeping all the other predictors
warfare (g) and time (h) on caloric production stability. Results are shown for at their mean values. Shaded areas represent 95% confidence intervals. The
the linear regression models including crop diversity (green), crop asynchrony figure was created with the statistical software package R 3.6.110.
(blue) and both (orange) (n = 590). Irrigation and nitrogen use intensity were

Extended Data Table 1 | Data sources underlying the analyses
See refs. 4–9.

Matters arising
Extended Data Table 2 | Determinants of national caloric production stability
Linear regression models include crop diversity (diversity model), crop asynchrony (asynchrony model) or both (combined model) (n = 590). Caloric production stability was log-transformed,
irrigation and nitrogen use intensity were square-root-transformed. Predictor variables were standardized to 0 mean and 1 s.d. across all nations and time intervals.

Matters arising
Reply to: Crop asynchrony stabilizes food

production
https://doi.org/10.1038/s41586-020-2966-5 Delphine Renard1 ✉ & David Tilman2,3

replying to L. Egli et al. Nature https://doi.org/10.1038/s41586-020-2965-6 (2020)
Check for updates
In the accompanying Comment, Egli et al.1 report findings related to our We have similar concerns about the measure of asynchrony used
Article2 on the stabilization of food production by crop diversity. In our by Egli et al.1. To calculate asynchrony, they used data on the annual
Article, we reported that crop diversity stabilized the combined caloric country-level production of each crop, rather than the annual
yield of all crops in a nation2. Our analyses showed that the portfolio country-level yields that we used. Their metric of asynchrony there-
effect3 was the probable cause of this greater stability, and we found no fore confounds fluctuations in the performance of individual crops—
support for the asynchrony hypothesis4. Egli et al.1 report somewhat as measured by yields—with fluctuations in the area planted to, and
different findings. harvested for, each of these crops, just as does their stability metric.
The results in our Article2 differ from those of Egli et al.1 because Because both yields and harvested area affect the total food supply
we analysed the year-to-year temporal stability of national caloric of a country, both have important implications for the agricultural
yield for all crops combined and analysed for asynchrony among policy of a country. We suggest that viewing each of these separately
individual crops using the national yield of each crop. By contrast, might be better for providing insights into the best way to maximize
Egli et al.1 analysed the stability of total national caloric production the year-to-year reliability of the food supply of a country. The link
and measured asynchrony on the basis of the annual national caloric found by Egli et al.1 between production asynchrony and total crop
production of each individual crop. Although yield and production production stability suggests the possibility that asynchronous vari-
are related to each other, they are not identical. Yield is the crop pro- ation in area planted with and harvested for various crops might also
duction per unit of land, whereas production is the yield multiplied contribute to national food stability. This interesting possibility merits
by the cropland area. Even using their different metric, Egli et al.1 further exploration.
found—as did we—that greater crop diversity led to greater national
temporal crop stability. 1. Egli, L., Schröter, M., Scherber, C., Tscharntke, T. & Seppelt, R. Crop asynchrony stabilizes
We suggest that national yield stability is the more informative and food production. Nature https://doi.org/10.1038/s41586-020-2965-6 (2020).
insightful of these two stability metrics because it directly measures 2. Renard, D. & Tilman, D. National food production stabilized by crop diversity. Nature 571,
257–260 (2019).
the year-to-year reliability of food production from a typical hectare of 3. Doak, D. F. et al. The statistical inevitability of stability–diversity relationships in
cropland in a nation. If the total area planted and harvested had been community ecology. Am. Nat. 151, 264–276 (1998).
4. Loreau, M. & De Mazancourt, C. Species asynchrony and its drivers: neutral and nonneutral
constant from 1961 until now in each nation, the two measures of stabil-
community dynamics in fluctuating environments. Am. Nat. 172, E48–E66 (2008).
ity would be identical. However, harvested area has been increasing in 5. The Food and Agriculture Organization of the United Nations Statistics (FAO, accessed
many lower-income nations for the past 60 years (an increase of 69% January 2019); https://www.fao.org/faostat/en/#data/.
in the least-developed nations since the 1960s5), and it first increased
and then declined in many high-income nations (a decrease of 29% in Acknowledgements We thank the Bren School of Environment Science and Management of
Europe since the 1980s5). These year-to-year changes in total national the University of California Santa Barbara for support leading to the initial publication. This
work was also supported by a grant overseen by the French ‘Programme Investissement
cropland area and year-to-year changes in yields both affect the sta- d’Avenir’ as part of the ‘Make Our Planet Great Again’ programme (reference: 17-MPGA-0004)
bility metric used by Egli et al.1. Because the database of the Food and and by a National Science Foundation grant (LTER-1831944).
Agriculture Organization of the United Nations (FAOSTAT) reports
Author contributions D.R. and D.T. wrote the paper.
the area harvested for each crop, not the area planted, the potential
to determine the effects of changes in area planted on national food Competing interests The authors declare no competing interests.
supply stability is limited. Finally, because yield stability is independ-
ent of year-to-year changes in national cropland area, we feel that yield Correspondence and requests for materials should be addressed to D.R.
stability is more informative of underlying biological mechanisms than Reprints and permissions information is available at http://www.nature.com/reprints.
is production stability. However, when more detailed data are available, Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
assessing changes and drivers of stability of yield and cropland area
would be informative. © The Author(s), under exclusive licence to Springer Nature Limited 2020
1
CEFE, CNRS, Univ. Montpellier, University Paul Valéry Montpellier 3, EPHE, IRD, Montpellier, France. 2Bren School of Environmental Science and Management, University of California Santa
Barbara, Santa Barbara, CA, USA. 3Department of Ecology, Evolution and Behavior, University of Minnesota, St Paul, MN, USA. ✉e-mail: delphinerenard@hotmail.fr

Corrections & amendments
Author Correction:
Area-based conservation
in the twenty-first
century
https://doi.org/10.1038/s41586-020-2952-y
Correction to: Nature https://doi.org/10.1038/s41586-020-2773-z
Published online 07 October 2020
Check for updates
Sean L. Maxwell, Victor Cazalis, Nigel Dudley, Michael Hoffmann,

Ana S. L. Rodrigues, Sue Stolton, Piero Visconti, Stephen Woodley,
Naomi Kingston, Edward Lewis, Martine Maron,
Bernardo B. N. Strassburg, Amelia Wenger, Harry D. Jonas,
Oscar Venter & James E. M. Watson
In this Review, the affiliation to which authors Victor Cazalis and Ana
S. L. Rodrigues are attributed (affiliation 2) should be corrected from
‘Centre d’Ecologie Fonctionnelle et Evolutive CEFE UMR 5175, CNRS,
Univ. de Montpellier, Univ. Paul-Valéry Montpellier, EPHE, Montpellier,
France’ to ‘CEFE, Univ. Montpellier, CNRS, EPHE, IRD, Univ. Paul Valéry
Montpellier 3, Montpellier, France’. This error has been corrected
online.

Author Correction:
Elephant shark genome
provides unique insights
into gnathostome
evolution
https://doi.org/10.1038/s41586-020-2967-4
Correction to: Nature https://doi.org/10.1038/nature12826
Published online 08 January 2014
Check for updates
Byrappa Venkatesh, Alison P. Lee, Vydianathan Ravi,

Ashish K. Maurya, Michelle M. Lian, Jeremy B. Swann, Yuko Ohta,
Martin F. Flajnik, Yoichi Sutoh, Masanori Kasahara, Shawn Hoon,
Vamshidhar Gangu, Scott W. Roy, Manuel Irimia, Vladimir Korzh,
Igor Kondrychyn, Zhi Wei Lim, Boon-Hui Tay, Sumanty Tohari,
Kiat Whye Kong, Shufen Ho, Belen Lorente-Galdos, Javier Quilez,
Tomas Marques-Bonet, Brian J. Raney, Philip W. Ingham, Alice Tay,
LaDeana W. Hillier, Patrick Minx, Thomas Boehm, Richard K. Wilson,
Sydney Brenner & Wesley C. Warren
In this Article, the Author Information section should have stated that
the miRNA reads for 17 tissues have been deposited at the Sequence
Read Archive (SRA) under accession numbers: SRR12545166–
SRR12545182. The original Article has not been corrected.

Author Correction:
Determination of RNA
structural diversity and
its role in HIV-1 RNA
splicing
https://doi.org/10.1038/s41586-020-2949-6
Correction to: Nature https://doi.org/10.1038/s41586-020-2253-5
Published online 6 May 2020
Check for updates
Phillip J. Tomezsko, Vincent D. A. Corbin, Paromita Gupta,

Harish Swaminathan, Margalit Glasgow, Sitara Persad,
Matthew D. Edwards, Lachlan Mcintosh, Anthony T. Papenfuss,
Ann Emery, Ronald Swanstrom, Trinity Zang, Tammy C. T. Lan,
Paul Bieniasz, Daniel R. Kuritzkes, Athe Tsibris & Silvi Rouskin
In the ‘Code availability’ section of this Article, the URL from which
the DREEM clustering algorithm can be accessed was incorrect. The
correct URL is https://codeocean.com/capsule/6175523/tree/v1. The
Article has been corrected online.

Author Correction:
Innovations present in
the primate interneuron
repertoire
https://doi.org/10.1038/s41586-020-2874-8
Published online 30 September 2020
Check for updates
Fenna M. Krienen, Melissa Goldman, Qiangge Zhang,

Ricardo C. H. del Rosario, Marta Florio, Robert Machold,
Arpiar Saunders, Kirsten Levandowski, Heather Zaniewski,
Benjamin Schuman, Carolyn Wu, Alyssa Lutservitz,
Christopher D. Mullally, Nora Reed, Elizabeth Bien, Laura Bortolin,
Marian Fernandez-Otero, Jessica D. Lin, Alec Wysoker,
James Nemesh, David Kulp, Monika Burns, Victor Tkachev,
Richard Smith, Christopher A. Walsh, Jordane Dimidschstein,
Bernardo Rudy, Leslie S. Kean, Sabina Berretta, Gord Fishell,
Guoping Feng & Steven A. McCarroll
In Figure 2b of this Article, two labels were inadvertently swapped:

the brown dot should be labelled ‘PVALB+’, and the yellow dot should
be labelled ‘SST+’. In addition, the y-axis label of Fig. 4g should read
“Percentage of striatal interneurons” not “Striatal interneurons (%)”
and the numbers on the axis should be 0, 10, 20, 30, 40 (rather than
0, 0.1, 0.2, 0.3, 0.4). In the sentence “The TAC3+ interneuron population
appeared to be shared between marmosets and humans (Fig. 4g), and
constituted 30% and 38% of the interneurons sampled in marmoset and
human striatum, respectively”, ‘38%’ should read ‘34%’. These errors
have been corrected online.

Publisher Correction:
Room-temperature
superconductivity in a
carbonaceous sulfur
hydride
https://doi.org/10.1038/s41586-020-2955-8
Check for updates
Elliot Snider, Nathan Dasenbrock-Gammon, Raymond McBride,

Mathew Debessai, Hiranya Vindana, Kevin Vencatasamy,
Keith V. Lawler, Ashkan Salamat & Ranga P. Dias
In this Article, owing to an error in the production process, the received

date was incorrectly stated as 31 August 2020 instead of 21 July 2020.
This error has been corrected online.

Publisher Correction:
Large Chinese land
carbon sink estimated
from atmospheric carbon
dioxide data
https://doi.org/10.1038/s41586-020-2986-1
Correction to: Nature https://doi.org/10.1038/s41586-020-2849-9
Check for updates
Jing Wang, Liang Feng, Paul I. Palmer, Yi Liu, Shuangxi Fang,

Hartmut Bösch, Christopher W. O’Dell, Xiaoping Tang, Dongxu Yang,
Lixin Liu & ChaoZong Xia
In the legend of Fig. 2 of this Article, owing to an error during the pro-
duction process, panels a and b were inadvertently labelled ‘northeast
China (within 38–54° N, 120–135° E)’ and panels c and d were inadvert-
ently labelled ‘southwest China (within 18–30° N, 95–110° E)’, rather
than the other way around. The figure was correct. This error has been
corrected online.

Advice, technology and tools
Work Your
story
Send your careers story
to: naturecareerseditor
@nature.com
GETTY
The pandemic is taking a toll on everyone — but the burden is larger for disadvantaged groups.
’YOU ARE ALWAYS LIVING

UNDER UNCERTAINTY’
Junior scientists who are members of minority ethnic groups or
are financially disadvantaged describe the support they need.
T
he coronavirus pandemic has affected this year by 4.9%, according to a study by the chair of the US National Postdoctoral Asso-
the entire scientific world, but in Pew Research Center in Washington DC. That ciation, based in Rockville, Maryland, which
unequal ways: although some scien- decline is expected to hit low-income commu- represents more than 40,000 postdocs.
tists have been able to carry on with nities and nations especially hard. The current crisis must be a call to action,
their lives and careers, many are strug- For students and postdocs from less- says Bea Maas, an ecologist at the Univer-
gling with family obligations, financial strain privileged backgrounds — first-generation sity of Vienna. Maas was the lead author of a
and tenuous employment. students, members of minority ethnic groups June report on the precarity of early-career
The pandemic has already dimmed job or those with financial stress — the pressures researchers during the pandemic’s first wave.
prospects in academia, and the full impacts are of the pandemic are, and will continue to be, “There must be a collective effort by the entire
probably yet to be seen. Global gross domestic particularly intense. “There are a lot of con- scientific community, especially those in lead-
product — the total value of goods produced cerns about very talented individuals falling ership positions, to respond to the short- and
and services provided — is forecast to shrink out of the pipeline,” says Barbara Natalizio, long-term challenges of this crisis and to

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Work / Careers
protect decades of efforts to build an inclusive not have the confidence to take advantage of don’t know whether they can go back to their
scientific community,” she says. those opportunities. They’re still not sure that own country, but they also don’t know whether
Nature asked five early-career research- academia really wants them to be there. they can stay where they’re at. Either way, they
ers from under-represented groups to share I encourage minority scientists to seize wonder whetherthey will have enough results
what they’ve experienced in the pandemic and opportunities in academia and industry. Many to defend their thesis.
their thoughts about surviving in the research companies are realizing there’s a problem with I was not working in the lab for three months,
enterprise. the make-up of their board, and they’re ready and I am at a career stage in which I need to
to embrace diversity in their hiring. Now is the produce results. I am researching the human
time for us to take advantage of momentum. microbiome, and it’s one of the most competi-
CHRYSTAL STARBIRD Chrystal Starbird is a structural biologist
tive fields in science right now. Analysing data
BEARING A HEAVY
is something that you can do at home, but it’s
postdoc at the Yale School of Medicine, New not the same as producing results in the lab.
EMOTIONAL BURDEN Haven, Connecticut, and co-chair of the Yale On the positive side, I’ve been able to rethink
Black Postdoctoral Association. my routine. Sitting at a computer for nine
I’m the co-founder and co-chair of the Yale hours per day isn’t necessary for computa-
Black Postdoctoral Association at Yale Uni- tional work. Flexibility is important for mental
versity in New Haven, Connecticut. Balanc-
EMMA HERNANDEZ-SANABRIA health, and universities should support that. I
FOREIGN STUDENTS ARE

ing everything during the pandemic has hope that funding agencies embrace flexibility
been one of the biggest challenges of my life, and change how they measure productivity.
which is saying something because I’ve been UNCERTAIN OF THEIR PLACE
through a lot. On a typical day, I get up at 4:30 Emma Hernandez-Sanabria is a senior
or 5:00 a.m. and go to the laboratory for a few Since the pandemic broke out, I think many microbiology postdoc at KU Leuven in
hours. Then, I come home and help my three people have been feeling really stressed, Belgium.
kids (ages 16, 14 and 7) with their school work. I including myself. You are always living under
go back to the lab in the evening and work late. uncertainty. At European institutions, you
My husband is a full-time student, so he can don’t know how long you are going to be able
ZEMMY ANG
THERE ARE PROS AND CONS
spend a lot of time at home. I’m grateful for to stay if you are from elsewhere (I grew up
that, but we still have disadvantages. Some of in Mexico and got my PhD in Canada before
my peers have hired tutors to help with school- moving to Belgium for my postdoc in 2016). TO VIRTUAL LEARNING
ing during the pandemic. We can’t afford that. If you don’t have a permanent position (this
For minorities, day-to-day struggles are is my fourth postdoc), life is very precarious. More funding would be a huge help. Students
compounded by what we see on the national Like many foreign postdocs in Europe, I’m from disadvantaged countries who want to
level. I’m not a very emotional person. But on a short-term contract. Unlike some Euro- become academics or researchers need help.
there have been days after a shooting or killing pean researchers who participate in tax-free I have friends who would be very happy to go
when I’ve had trouble focusing. It’s extremely funding schemes, I have to pay my own income overseas and then to graduate school, but they
heavy. We’re carrying a lot of extra weight that taxes, and I don’t have any job security. Many can’t because they have to get jobs to support
makes it difficult to be scientifically fruitful. international students feel stuck because they their families. The pandemic will make things
I know people from all walks of life and of
many nationalities, and it’s obvious to me that
Black and Hispanic communities are being hit
especially hard by the pandemic. You hear
about someone’s aunt, sister, grandmother
or cousin dying or going to hospital. People
are coming to the lab and collecting data while
their families are falling apart around them.
I’m amazed by their strength.
University administrators, funders and
stakeholders must think about how to level the
playing field going forward. Yale has accom-
plished positive things, including grants to
help postdocs who have lost funds because
of the pandemic. They’ve also expanded the
number of available slots at their daycare pro-
grammes, but those cost more than US$2,000
per month. I don’t know how helpful that’s sup-
posed to be for a postdoc parent. There’s a real
sense that universities are out of touch with
EMMA HERNANDEZ-SANABRIA
what disadvantaged students are enduring.

Researchers from minority ethnic groups
are seeing job opportunities that maybe
didn’t exist before the launch of diversity ini-
tiatives this year. There’s a lot of enthusiasm
at Yale and elsewhere in the United States for
hiring scholars of colour — but those schol- Emma Hernandez-Sanabria working with a simulator of the human microbial ecosystem.
ars are extremely overwhelmed. They might

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
DANIEL GONZALES
IT’S TIME TO MAKE
ACADEMIA MORE INCLUSIVE
When everything shut down in March, it took
me a couple weeks to settle into working from
home. But I don’t think anyone quite knows the
full repercussions of the shutdown. Even now,
research is still slow in many places and will
probably continue to be slow for six or eight
more months, or maybe a year. For the three
months that I couldn’t enter the lab, I tried to
remain positive.
I had time to think about some theoreti-
cal aspects of the work that I had previously
never had a chance to consider. The pan-
demic gave us an opportunity to sit back
and develop some really cool theories for the
technologies that we’re working on, including
ZEMMY ANG
nanoelectric devices that would allow for

both electrical and optical recordings of
PhD student Zemmy Ang. neural activity.
Because my job has been stable during the
that much harder. They’ll have to put their older people. Now, I’m doing remote work in pandemic, I’ve been thinking more about sys-
plans on hold, or just not pursue them at all. physics with Arizona State University in Tempe temic racism and what I can do to support the
Even with things going virtual now, classes so that I can apply to a PhD programme. racial-justice movement Black Lives Matter.
are challenging — describing your equations I live in New Jersey, and there are no PhD pro- I’m thinking about what kind of culture we
over Zoom is much harder than writing out grammes near me with courses I want to take. want in academia and how to foster an inclu-
what you think on the whiteboard in class. I can’t just leave, because I’m a single parent. sive culture that’s empowering to people from
I’m trying to imagine what my friends in I have to take care of everything on my own. all different types of backgrounds.
the Philippines, where I’m from, are facing. I am doing as much research as I can. People We need to let more people know about
Everything is shut down, and it might be used to look down on remote academic work, opportunities and make academia more
impossible for them to do their work. When I but that’s changing. It’s useful for people like inviting. For example, I grew up in a rural
lived there, I once had to give a Skype presenta- me, who don’t have those programmes nearby. part of West Texas, and I was able to go to my
tion from a cafe because it was the only place I’m afraid to move to Arizona, not only because hometown university thanks to a wonderful
nearby that had a reliable Internet connection. of all the pandemic uncertainty but also scholarship for first-generation students
Before the pandemic, my Filipino passport that paid for my schooling expenses for four
limited me to conferences in Singapore and “Stories like mine are years.
Hong Kong. I didn’t have the visas that I would There were so many things I didn’t know as
need to attend conferences in the United
a real wake-up call for a first-generation student. I didn’t realize that
States, United Kingdom or Europe. That’s a people who aren’t most graduate programmes in science, tech-
big challenge for early-career researchers used to struggle.” nology, engineering and maths were paid for. I
from the developing world. Not being able to didn’t realize you got a stipend. I didn’t realize
go to conferences puts students from disad- there were health-care benefits.
vantaged backgrounds even further behind. because people have told me that minorities I chose the applied-physics programme at
So in a way, going virtual is great, although it’s don’t always feel welcome in that state. Aside Rice University in Houston, Texas, for graduate
not ideal to network while staring at screens. from the money problems, this programme school because it didn’t require the physics
could create many mental and emotional graduate-admissions examination, which
Zemmy Ang is a PhD student in elecro-optical issues for me, so there’s a lot to consider. would have cost money. I knew I wanted to
engineering at Ben-Gurion University of the I’ve been working for years to start a PhD run a research lab, but just didn’t think that I
Negev, Israel. programme, but I don’t know whether I can get could personally achieve that. But a series of
the funding for one. Even before the pandemic, mentors empowered me to do so.
none of the institutions I contacted walked me Researchers and administrators need to
ROSA FERRERIA through the ins and outs of financing a PhD. ask themselves why they’re not having more
CONCERNS OVER AN
I want to be an astronomer, but now I don’t of the conversations that will make science
know whether that will work out. I’m really more inclusive.
UNCERTAIN FUTURE scared about it. I have no idea how things will
change as the pandemic worsens, but it doesn’t Daniel Gonzales is a physics postdoc at
My path to science is really unconventional. I look good. Stories like mine are a real wake-up Purdue University, West Lafayette, Indiana.
was born and raised in the Dominican Repub- call for people who aren’t used to struggle.
lic. My family came to the United States when Interviews by Carrie Arnold and Chris
I was 16, and I taught myself English when I Rosa Ferreria is a remote undergraduate Woolston
was 17. I was living on the street before I put student in physics at Arizona State University These interviews have been edited for length
myself through college, while working with in Tempe. and clarity.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Work / Technology & tools
ADVANCED IMAGING CENTER, HHMI JANELIA RESEARCH CAMPUS

The Advanced Imaging Center in Ashburn, Virginia, provides technical training for core-facility managers.
CORE CURRICULUM: LEARNING

TO MANAGE A SHARED
MICROSCOPY FACILITY
High-tech tools are increasingly being consolidated in
specialized centres. Running these technological wonderlands
takes a unique blend of skills. By Sandeep Ravindran
I
t’s the long wait for equipment that in journals such as the Proceedings of the between 2011 and 2015, from 30 to 60.
H. Krishnamurthy remembers from his National Academy of Sciences. The people tasked with running these
master’s studies at Bangalore University Over the past 20 years, as instrument costs facilities have a rare collection of skills:
in India. “I used to stand in line to get my have risen and funding levels fallen, institutions in-depth knowledge of the hardware they
turn to use a rusted hammer to nail [down] have increasingly consolidated microscopes, oversee, managerial and financial acumen to
a frog for dissection,” he says. mass spectrometers, flow cytometers and run what is effectively a business, and scien-
Today, Krishnamurthy directs a facility other high-tech equipment in specialized core tific know-how to guide researchers through
so that other researchers in Bengaluru and facilities, where dedicated staff can cost-ef- a range of experimental systems and designs.
throughout India need never experience that fectively provide a breadth of expertise and The management aspects alone would usually
lack of access. The Central Imaging and Flow access to equipment beyond what any single fill three jobs — financial manager, project
Cytometry Lab at the National Center for laboratory could manage. Numbers are hard to manager and people manager — says Graham
Biological Sciences in Bengaluru “has helped come by, but Peter O’Toole, director of the Bio- Wright, acting director of the Research Sup-
scientists to take their research to next level”, science Technology Facility at the University of port Centre at the Agency for Science, Technol-
he says. “Before I started this facility, there York, UK, has seen meetings for UK core-facility ogy and Research in Singapore. Krishnamurthy
was no paper published in Cell from India.” managers grow from a dozen participants in was once asked to list his responsibilities, and
Since then, users have published more than 2006 to around 200 today. And in Germany, says he was shocked at how many he had. “This
half a dozen papers in Cell, as well as others the number of imaging core facilities doubled is not a 9-to-5 job, it’s a 24-hour job,” he says.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Until a few years ago, however, there was no Teng-Leong Chew, who directs the AIC. That Yet despite their crucial role in the conduct
clear career track, and few specialized training means having not just technical proficiency in of research, core facilities are often as much
opportunities. “All the people in my genera- that equipment, but also a deep understand- businesses as laboratories, and staff rarely
tion figured it out as we went along,” says Jen- ing of its theoretical underpinnings, as well receive authorship unless they also provide
nifer Waters, director of the Nikon Imaging as the engineering skills to be able to install, significant scientific input. Acknowledge-
Center at Harvard Medical School in Boston, maintain and repair it. ments are more common, although not guar-
Massachusetts. Also required are the skills and experience anteed. “Sometimes you do hard work, and
But things are changing. Waters has to weigh in on a diverse array of projects and you will not see that reward directly coming
launched a programme at Harvard that experimental models. “One minute you’re in terms of an acknowledgement,” says Jan
provides technical training for core-facility working on yeast and bacteria and the next Peychl, head of the Light Microscopy Facility
managers ( J. C. Waters Trends Cell Biol. 30, minute on brain slices, and that means not only (LMF) at the Max Planck Institute of Molec-
669–672; 2020), and other institutions have do we see a whole variety of scientific spec- ular Cell Biology and Genetics in Dresden,
created similar programmes, including the imens, but also a huge range of microscopy Germany. To help address this, the Royal
Advanced Imaging Center (AIC) at the Howard technology,” says Alison North, senior direc- Microscopical Society and the US Association
Hughes Medical Institute’s Janelia Research tor of the Rockefeller University Bio-Imaging of Biomolecular Resource Facilities (ABRF)
Campus in Ashburn, Virginia, and the Euro- Resource Center in New York City. have developed authorship guidelines that
pean Molecular Biology Laboratory (EMBL) Chew, for instance, helps researchers to core facilities can provide to their users.
in Heidelberg, Germany. And fresh funding work out what imaging approach to use, how
opportunities for managers and staff are mak- to design experiments, how to handle the Service providers
ing the career track easier to navigate. equipment and how to analyse and interpret As microscopy core facilities become increas-
“Probably the most important thing I’m the resulting images — a process that can take ingly prevalent, they offer an intriguing career
going to do in my career is to help train this anywhere from hours to days of one-on-one option for PhD and postdoctoral researchers.
next generation,” says Waters. time. “You have to provide very good train- “I think it is a great career,” says North. But it’s
ing, not just in how to use the instrument, but not a position for someone who’s just applying
Combining breadth and depth [in] how to make sure that your experiment is “because they’re worried that they won’t get a
At some core facilities, users drop off samples accurate, ethical, quantitative and reproduc- job” as a principal investigator, she says.
and receive data in return. At others, includ- ible,” he says. North looks for applicants who have expe-
ing Krishnamurthy’s, staff train users, but Staying up to date with technology is cru- rience with different types of specimen and in
are not involved in the actual experiments. cial, be it through the literature, conferences training others in microscopy, and who have
Many microscopy facilities lie between these or word-of-mouth. “That’s part of the job, to been responsible for troubleshooting. The
extremes, with staff advising users on what scout for new technology and make sure you’re exact scientific training can vary, but a “lack
imaging techniques are best suited to their using it earlier,” says Stefanie Reichelt, who of ego” is a must, she says. “If it’s all about get-
projects and working with them as they oper- runs the light-microscopy core facility at the ting credit for the work you’ve done, then this
ate the equipment. Cancer Research UK Cambridge Institute at the is not the right job for you.”
Whatever the operational model, technical University of Cambridge. Popular meetings Indeed, core-facility staff have very differ-
expertise is paramount. “The most important for microscopy core-facility staff include the ent jobs from conventional scientists, and
prerequisite to work in a core facility is that you European Light Microscopy Initiative; Focus organizations might need to implement per-
really have to be very good at the techniques on Microscopy; the UK Royal Microscopical formance metrics for core-facility managers
that you are operating, be it microscopy, Society’s Microscience Microscopy Congress; to match, says Wright. These metrics “are
flow cytometry or mass spectrometry,” says and the Seeing is Believing conference. likely not focused on the number of publi-
cations or grants awarded, but more on the
users trained, level of user satisfaction, cost
recovery, acknowledgements in publications
et cetera”, he says.
Facility-director salaries can vary widely,
from around £35,000 (US$47,000) per year for
postdoc-like positions to £90,000 for senior
managers, says O’Toole. “These can be very
different roles with a very similar job title,”
he says. Those on the lower end of the scale
are likely to have many fewer responsibilities
and staff, will not write grant applications and
might not fully control their budgets. But they
can progress by expanding their core facility
or moving to a more senior position at a larger
facility.
Before he was a core-facility director, Peychl
was a medical doctor, a stage actor and an
assistant professor. The common denomina-
JENNIFER C. WATERS
tor, he says, is people skills: “I deal with hun-

dreds of people, and I need to understand their
motivations, and their needs and emotions as
well,” he says.
Harvard Advanced Microscopy Fellows Rylie Walsh and Federico Gasparoli. As a junior assistant professor at the
Charles University Faculty of Medicine in

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Work / Technology & tools
Hradec Králové in the Czech Republic, Peychl the Chan Zuckerberg Initiative in Redwood Fellows spend half of their time on core
says, he was always the one taking care of his City, California cover staff, too. service work, such as training users and main-
department’s common microscopes. “The Facility staff positions are often not perma- taining microscopes, and another half on an
person who first goes and fixes it probably nent, which can lead to regular loss of institu- independent project, such as writing code to
is highly qualified to become core-facility tional knowledge. “Usually you need minimum assess the quality of microscopy images.
manager,” he says. “It’s important that you of a year to get a person up to speed,” says Fellows also learn management and budget-
have this inner drive to support people with Peychl. Such turnover presents challenges, ing skills, and work on grants with Waters. And
cool technologies.” he says, but it also encourages flexibility. they learn the fine art of dealing with clients.
In 2001, Peychl’s interest in microscopy For instance, facility managers had to react “I want them to have the confidence to sit in a
inspired a career change, and he applied for quickly when faced with the COVID-19 pan- room with a group of faculty and say, ‘I know
an imaging-specialist position at the LMF. “Of demic, both to keep their staff and users safe that technology looks interesting, but you
course, they were surprised that a medical doc- and to minimize disruptions. don’t have the need for it,’” says Waters. “If I
tor would apply for a technician job, but they have to write an uncomfortable e-mail to a fac-
immediately offered me a postdoctoral fellow- “In the end, it’s all ulty member, I will write it and then forward it
ship to support the work of the core imaging to my postdoc so they can see how I worded it,”
facility,” he says.
about hands-on she says. “They get to see how those situations
experience.” are handled, and they leave with a little bit of a
The business end toolkit of how to manage that.”
Peychl’s experience is unusual in that training Waters has two current fellows and four
opportunities for core-facility management Peychl, working with Elisa Ferrando-May at alumni who have gone on to work at or run
are relatively rare. But they are increasing in the University of Konstanz Bioimaging Center other core facilities. “I want them all to walk out
number. in Germany and others, published COVID-19 of here being technical experts who deserve
The German Society for Microscopy and guidelines for core facilities, such as avoiding respect and are given autonomy,” she explains.
Image Analysis, known as German BioImaging, face-to-face training and providing remote Staff might also go into industry, especially to
has offered a course on core-facility manage- support to microscope users (S. Dietzel et al. microscopy companies.
ment and leadership in Germany since 2013, Cytometry A 97, 882–886; 2020). Roland Other institutes have created similar
and Global BioImaging, an international net- Nitschke, head of the Life Imaging Center at programmes. For instance, Janelia’s Advanced
work of imaging infrastructures and communi- the University of Freiburg, Germany, set things Microscopy Fellows spend half of their time
ties coordinated from EMBL Heidelberg, offers up at his facility so that he could remotely con- getting hands-on experience at the AIC and the
similar courses around the world. The ABRF trol microscopes to show users what to do. The other half on self-directed training, whether
and Core Technologies for Life Sciences, a Par- set-up required ultra-high-resolution cameras, that’s learning how to code or how to run an
is-based non-profit association of core-facility and scientists started using the same cameras international microscopy workshop. “That
scientists and staff, have workshops that teach to check in on long-running experiments from 50% protected time allows them to hone their
business skills such as accounting and budget- home. “Maybe it’s the only good thing to come skill to become a successful core director in
ing. Staff can sometimes take business-school out of the coronavirus,” says Nitschke. the future,” says Chew.
classes offered by the universities where their In Europe, EMBL’s ARISE fellowship
facilities are based. And some managers, A core crash course programme is launching training for 62 fellows
including Wright, even complete a master’s To bring new recruits up to speed, some man- over the next five years, supported by some
of business administration, which he says has agers turn to external courses and workshops. €12.7 million (US$15 million) from the Euro-
been “incredibly useful”. Waters teaches yearly microscopy courses at pean Commission and EMBL. Calls for applica-
German BioImaging and Global BioImaging Cold Spring Harbor Laboratory in New York, tions went out in November, and fellows will be
also organize job-shadowing programmes, and North does the same at the Woods Hole hosted at one of six EMBL sites across Europe.
which allow staff to pick up skills from their Marine Biological Laboratory in Massachu- “We want to build through the programme
peers. “I see real value in visiting other core setts. And Krishnamurthy provides a training the future heads or senior staff of research
facilities to understand how they operate and programme on confocal microscopy and flow infrastructure facilities,” says Tanja Ninkovic,
bring home the relevant bits to help improve cytometry in Bengaluru. But, “in the end”, says ARISE programme manager at EMBL Heidel-
our own operations,” says Wright. Peychl, “it’s all about hands-on experience”. berg. As well as learning the ins and outs of
Grant-writing skills are essential. “The Most researchers use just one or two types core-facility management, fellows will pursue
success of a core facility, to some extent, is of microscopy during their PhD or postdoc. an independent research project and study
reflected by the number of shared instrumen- “We have everything from single molecules to technology transfer, entrepreneurship and
tation grants that it secures,” says Chew. In the organoids and embryos coming into the core, science policy. “These are the skills that are
United States, the National Science Founda- and we have 15 different imaging modalities,” needed by research infrastructure scientists
tion and Department of Defense both fund says Waters. “The idea of anybody walking out regardless of the technology that the facility
core facilities, and the National Institutes of of a postdoc and into a core facility, and being is offering,” says Ninkovic.
Health’s S10 shared instrumentation grant is able to immediately know what they’re doing, Peychl, who attended a German BioImaging
“almost tailor-made for core-facility direc- is ridiculous,” she says. course for facility managers, sees the increase
tors”, Chew says. Facilities in other parts of That’s what motivated Waters to start Har- in core-facility management courses as a sign
the world, including most European coun- vard’s Advanced Microscopy Fellowship in that these positions are increasingly valued.
tries, are also funded mainly by government 2013. Waters originally funded two fellows by And he encourages others to consider the role.
grants. reallocating salary for a staff member, but has “For those who like to support others, who are
Most of these grants cover only instru- since been supported by microscope manufac- technologically inclined and who have people
mentation, but some, including the German turer Nikon and three departments at Harvard skills, this could be a rewarding career.”
Research Foundation’s core facilities funding Medical School. The programme costs about
programme and the imaging scientists pro- US$80,000 a year for each fellow, and partic- Sandeep Ravindran is a science writer based
gramme from the philanthropic organization ipants are guaranteed two years of funding. in Washington DC.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
The back page
I
Where I work have just spent six months in the Great
Barrier Reef on the research vessel
systems, including an instrument to measure
ocean conductivity, temperature and
John Fulmer Falkor. Every cruise tackles cutting-edge
science and, as lead technician, I’ve
density; a multibeam scanner to map the
ocean floor; and the ROV SuBastian, which I’m
been able to witness so many firsts: the working on in the photo.
RV Falkor is the first ship to map much of this We’ve been lucky that our institution,
area at high resolution, and to put a remotely the Schmidt Ocean Institute in Palo Alto,
operated vehicle (ROV) down in the Great California, has kept the work going during the
Barrier Reef at depths of up to 2,000 metres. pandemic. Everyone undergoes a two-week
So everything we’re seeing is new. With the hotel quarantine and COVID‑19 testing before
ROV’s two arms, we gather samples of rocks, getting on board. Other researchers haven’t
corals, sediments, jellyfish — everything we been able to get to sea, so we livestream our
come across down there — for collaborators ROV dives. Collaborators can log in and tell
to analyse later. us what looks worth sampling. Anyone can
Recently, we discovered a peculiar knoll watch and ask the scientists questions, and
rising from the sea floor 2,000 metres down, receive live answers.
peaking 300 metres below the surface. Science never sleeps. I work 12-hour shifts
The odd thing is that it shouldn’t be there. every day for 6-month stretches, with no
There’s little to no volcanic activity in the days off, and I’m on call all the time. I’m just
area to create a mound, and it should have starting six months on leave, but sometimes
been eroded by water like its surroundings. I’ll tune in to the dives. It’s exciting stuff.
Analysis of the rock samples we took from
the knoll might explain things. John Fulmer, lead technician for the
I’m the main liaison between the ship’s Schmidt Ocean Institute’s RV Falkor, is
Photograph by the crew and the scientists travelling with us. currently on leave in Halifax, Canada.
Schmidt Ocean Institute. Three technicians maintain all the shipboard Interview by Amber Dance.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
Sustainable nutrition
For more on
sustainable nutrition
visit nature.com/
collections/
sustainable-nutrition-
outlook
I
Editorial f it was easy to change the way we eat, malnutrition in all its forms Contents
Catherine Armitage, Herb Brody, — undernutrition, overnutrition and micronutrient deficiency —
Richard Hodson, Jenny Rooke S54 GLOBAL DIET
could have been eliminated long ago. Everyone would have access
Healthy people, healthy planet
Art & Design to affordable food and choose to eat the quantity and variety that
Eating habits must change if we
Mohamed Ashour, Annthea Lewis, keeps them in optimal health. are to feed 2050’s population
Denis Mallet That’s the dream. The reality is that humans have a food problem. It
is a complex and multidimensional issue, but in broad brush strokes: S57 OPINION
Production
Cooperate to prevent food-
Nick Bruni, Kay Lewis, Ian Pope, Karl huge numbers of people go hungry; large nutritional imbalances per-
system failure
Smart sist between high- and low-income nations and regions; and the food Jessica Fanzo explains why
system, from production to supply and consumption, is both failing governments need to work
Sponsorship
Yuki Fujiwara, Natasha Boyd, society and damaging the planet. together
Takeaki Ishihama The COVID-19 pandemic has highlighted the problems, and exacer-
S58 AGRICULTURE
bated them (see page S57). Political, economic and cultural obstacles Natural solutions for
Marketing
are prolific on the pathway to achieving a sustainable global diet (S54). agricultural productivity
Gavin Buffett
Even if a suitable global menu could be agreed on, changing eating How to farm intensively and
Project Manager behaviours at scale is a formidable and understudied challenge (S70). sustainably
Rebecca Jones The mainly plant-based diet that nutritional scientists recommend S60 AQUACULTURE
Creative Director for physical and, more recently, mental health (S63) is better for the Cultivating a sea change
Wojtek Urbanek environment than diets that are heavy in meat and highly processed Growing the seafood industry
foods. To reduce our reliance on farmed meat, scientists around the sustainably
Publisher
world are developing affordable protein alternatives. Researchers are S63 HEALTH
Richard Hughes
racing to transform lab-grown meat from a headline-grabbing novelty Eating for better mental health
VP, Editorial into a viable industry supplying supermarkets (S64). And according Mood could be linked to the
Stephen Pincock to projections, aquaculture is ramping up to overtake wild fish stocks microorganisms in our gut
Managing Editor as the main source of aquatic protein in diets by 2050 (S60). Farming S64 CELLULAR AGRICULTURE
David Payne methods that intensify agricultural production while rebuilding and Cell-based meat with a side of
sustaining natural systems are also becoming more widespread (S58). science
Magazine Editor Growing meat at scale is still a
Helen Pearson
Diversity is key. There is no single solution that will guarantee sus-
tainable nutrition for everyone. In the same way that the pandemic challenge
Editor-in-Chief demands an integrated, cooperative and global response, in which S68 SUSTAINABLE NUTRITION
Magdalena Skipper science plays its part, so does feeding the global population. Research round-up
We are pleased to acknowledge the financial support of The latest studies
Ajinomoto Co., Inc., Yakult Honsha Co., Ltd and NTT Corporation in
S70 BEHAVIOUR
producing this Outlook. As always, Nature retains sole responsibility Changing diets at scale
for all editorial content. Different communities and
cultures require different
Catherine Armitage approaches
Chief editor, Nature Index
About Nature Outlooks available free online at go.nature.

Nature Outlooks are supplements com/outlook
to Nature supported by external
funding. They aim to stimulate How to cite our supplements
interest and debate around a subject Articles should be cited as part of a
of particularly strong current supplement to Nature. For example:
interest to the scientific community, Nature Vol. XXX, No. XXXX Suppl.,
in a form that is also accessible to Sxx–Sxx (2020).
policymakers and the broader public. Contact us
Nature has sole responsibility for feedback@nature.com
all editorial content — sponsoring For information about supporting a
organizations are consulted on the future Nature Outlook supplement,
On the cover topic of the supplement, but have visit go.nature.com/partner
Eating well for the health no influence on reporting thereafter
of the planet. Credit: (see go.nature.com/33m79fz). All Copyright © 2020 Springer Nature
Sophie Casson for Nature Nature Outlook supplements are Ltd. All rights reserved.
Nature | Vol 588 | 10 December 2020 | S53

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
SOPHIE CASSON FOR NATURE

Healthy people, healthy planet
To provide 2050’s estimated 10 billion people with a healthy diet, global eating
habits need to become more sustainable. By Chris Woolston
E
very morsel of food from every plate, research. “We need to produce food groups them,” says Johan Rockström, an environmen-
bowl and cooking pot around the that are good for health in ways that are restor- tal scientist at Stockholm University. In 2019,
world takes a small bite from Earth’s ative to the planet, rather than extractive,” Rockström, Hawkes and other members of an
resources. The human diet places says Corinna Hawkes, director of the Centre international group of scientists proposed the
a strain on the environment, water for Food Policy at City, University of London. EAT-Lancet diet1, a global meal plan that could,
resources, biodiversity and just about every The particular foods on the plate will vary from in theory, feed 2050’s estimated population of
other measure of planetary health. With so one place to another, she says, but those meals 10 billion people (see ‘Planetary-health diet’).
much at stake, researchers have turned their need to add up to something more sustainable That plan called for drastic cuts in meat con-
attention to a pressing question: what sort of than society’s current fare. sumption and a much higher intake of fruits
diet can the planet realistically support? “When you look carefully at the big sys- and vegetables. But it proved controversial
The answer requires insights from fields tems that regulate the stability of our planet, with meat-industry proponents and econ-
such as nutrition, agriculture and climate food is a dominant player in essentially all of omists, and the quest for a planetary diet
S54 | Nature | Vol 588 | 10 December 2020

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
APHRC
Elizabeth Kimani-Murage addresses community members at a meeting about food insecurity in Nairobi, Kenya.
continues. When researchers and policy- meat consumption of high-income countries, about 200 g of fish and 200 g of white meat.
makers convene at the United Nations Food Hawkes says. She notes that consuming a lot Controversially, the diet allows for just 100 g or
Systems Summit in late 2021, a healthy-planet of red meat can raise the risk of cancer and so of red meat, around one and a half servings,
diet will be near the top of the agenda. heart disease. “It’s not great for our health, per week. Rockström notes that’s a significant
The goal will be a basic framework, not an and it’s not great for our planet,” she says. reduction from the roughly 700 g of red meat
item-by-item menu, says Agnes Kalibata, a “There’s a strong alignment between health consumed each week by people in places such
food-policy specialist in Kigali, Rwanda, who and sustainability.” as North America and Europe, but it’s much
will be leading the summit as the UN special more than the amount typically eaten by peo-
envoy. “Diets are influenced by cultures and “We have to rethink our ple in low-income countries.
custom,” she says. “We can come up with the Despite its dire environmental impacts, meat
principles of what a good diet will look like. We
diets based on who are still has an important place in the global diet.
need to find a balance.” the most vulnerable On a nutritional level, the proteins and minerals
among us”. from animal products could be a real boost for
Sustainability on a plate malnourished populations around the world,
Most researchers agree that the current diet Hawkes says. “For infants who otherwise eat
is not sustainable. A 2018 analysis2 estimated This convergence of nutrition and conser- rice or starchy cassava, meat is an incredibly
that food production releases the equivalent vation is a central message of the EAT-Lancet efficient way of boosting micronutrient sta-
of 13.7 gigatonnes of carbon dioxide in green- diet. The authors started by reviewing the tus,” she says. What’s more, she says, “meat has
house gases into the air each year — more than best evidence for constructing a diet that tremendous cultural significance in people’s
one-quarter of all human-caused greenhouse would optimize human health and reduce lives — it’s associated with high status”.
gases. The same report estimated that agricul- the global toll of food-related health condi-
tural irrigation accounts for about two-thirds tions, such as diabetes, heart disease, cancer Expensive gains
of all fresh water used by humans. And about and obesity. The researchers didn’t even con- After investigating the potential environ-
37% of the planet’s land area, excluding deserts sider the impacts on climate or sustainability mental impacts of the EAT-Lancet diet, the
and ice sheets, is already dedicated to food until the nutritional framework had been set, authors concluded that a nutritious diet for
production. That footprint is likely to grow Rockström says. people could also be good for the planet.
as the population increases. The EAT-Lancet commission ultimately pro- “We found that a healthy diet combined
Some foods take up many more resources posed a ‘flexitarian’ diet that spans a spectrum with sustainable agricultural practices would
than others. At the upper end, just 100 grams of food groups. It also suggested vegan and have positive impacts on biodiversity, land,
of beef protein can result in the release of vegetarian options. Plants form the foun- water, nutrients and climate,” Rockström
the equivalent of 105 kilograms of CO2. The dation of the commission’s flexitarian diet, says. The most significant improvements
same amount of protein from a well-managed which recommends the daily consumption tied to a change in diet would come from a
field of peas, by contrast, typically releases of 300 g of vegetables, 200 g of fruit, around reduction in phosphorus and nitrogen pol-
the equivalent of only about 0.2 kg of CO2. 230 g of whole grains and 125 g of plant-based lution in waterways and greenhouse-gas
These orders-of-magnitude differences protein-rich foods, such as lentils, nuts and dry emissions. The commission estimated that
mean that any vision of a more sustainable beans. The diet calls for a mere five servings of the new meal plan could cut related green-
diet has to include marked reductions in the animal protein per person per week, including house-gas emissions by about half — down to

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
PLANETARY-HEALTH DIET would look much like the EAT-Lancet diet, but
If every person had a daily food allocation to sustain not only their health but also that of the planet, what would it with a decidedly East African flavour.
look like? The answer to this question in a study¹ from the EAT-Lancet commission, places an emphasis on plant-based
The proposal from her institute is one of ten
foods, and recommends an amount of animal-derived protein much lower than that eaten in high-income countries
but much higher than the amount consumed in low-income countries. to make the finals of the Food System Vision
Macronutrient intake (grams per day) Prize, a global contest sponsored by the
Tubers or starchy Rockefeller Foundation in New York City. The
Vegetables Whole grains Protein sources Fruit vegetables
209 200
winners are due to be announced in Decem-
300 232 50
ber. Kimani-Murage says she wants Kenya to
move away the sort of large-scale industrial
Dairy foods Added fats farms that currently feed cities around the
250 52 world. “Food has been so commercialized or
Added sugars commodified,” she says. “It’s produced for
Legumes Nuts Poultry Fish Eggs 31
75 50 29 28 13 money and not for feeding people. We want
to continue this local production of food
even as the world urbanizes.” She notes that
Beef, lamb and pork local food production could also significantly
14
reduce the costs of production and shipping,
potentially increasing the affordability of an
IMPACT OF UNUSED FOOD EAT-Lancet-style diet.
Food loss (from post-harvest through the supply chain and up to, but not including, retail) and waste (at retail Feeding people at a local level is the key
and consumption level) from the main food groups have negative environmental impacts. They all have a focus of the 2021 UN Food Systems Summit.
blue-water (cubic metres of water wasted), carbon (tonnes of carbon dioxide equivalent omitted) and land
(hectares of land used) footprint per tonne of food lost or wasted.
The current global diet, Kalibata says, is unbal-
anced, largely because of gaps in wealth and
Cereals and pulses Fruits and vegetables Roots, tubers and oil-bearing crops Meat and animal products opportunity. In poor areas around the world,
9% people tend to fill their stomachs with starchy,
35% 44% Blue-water footprint carbohydrate-heavy food because they can’t
SOURCE: REF. 1 (TOP); FAO (BOTTOM)

afford other, more nutritious alternatives. “We
Carbon footprint have to rethink our diets based on who are the
Food loss
and waste
most vulnerable among us,” Kalibata says. She
Land footprint
says that high-quality proteins, whether meat-
or plant-based, need to replace many of the
0 20 40 60 80 100 carbohydrates eaten in poorer areas.
13% Proportion (%) Kalibata thinks it will be possible to meet
nutritional needs in the future, but it will take
about 5 gigatonnes of CO2 equivalent. healthy. It lacks, among other things, the fibre concerted effort to reduce waste (see ‘Impact
If the EAT-Lancet diet was adopted, it would needed for optimal digestion, the phytochem- of unused food’), localize production and
undoubtedly be a healthy step forward for icals that can protect against cardiovascular expand the food options of the global poor. All
people and the environment. But it has faced disease and cancer, and the healthy fats that these issues will be on the agenda at the sum-
fierce opposition over its potential to devas- support the brain. mit, and Kalibata hopes they’ll inspire a global
tate the animal-husbandry industry, and has Poverty is a crucial barrier to improved plan of action in the years to come. “We’ve had
been criticized as being too expensive for global nutrition, but, Masters says, it is impor- food summits before,” Kalibata says. “We have
many consumers. One analysis3 calculated that tant not to lose sight of the greater proportion to make this one different. We have to deliver
nearly 1.6 billion people would be too poor to of the world’s population that could afford to on the goals.”
buy the recommended mix of foods, especially eat better, but doesn’t. “A vastly larger num- Rockström hopes that the EAT-Lancet plan,
the meat, fruits and vegetables. “No amount ber of people could walk into a grocery store despite its shortcomings, will still serve as
of nutritional knowledge is going to get them tomorrow and buy a healthier diet that’s more inspiration for efforts to put humanity’s nutri-
there because they can’t afford it,” says William environmentally sustainable than the one they tional needs on a more sustainable footing. “It
Masters, one of the study’s authors and a nutri- eat now,” he says. is not the final answer in any way,” he says, “But
tional economist at Tufts University in Boston, it was the first time people got together from
Massachusetts. “They may have US$1 a day to Local eating, global impacts disciplines in health, agriculture and sustaina-
spend on food but they would need $1.50,” Any modification to the global diet will have bility to try to answer the big questions. If we’re
he says. to start with changes at a local level. Elizabeth going to take seriously that we’ll have 10 billion
The analysis found that the EAT-Lancet Kimani-Murage, a nutrition specialist with citizens who all need to eat, we’ll need to live
model was about 60% more expensive than the the African Population and Health Research in a certain balance.”
cheapest alternative diet that could provide Center in Nairobi, Kenya, predicts a future in
all 20 essential nutrients people need to sur- which residents of her city feed themselves Chris Woolston is a freelance writer in Billings,
vive. That bargain diet — which consists almost with locally grown foods from kitchen gardens Montana.
entirely of starchy staples, such as rice, cassava and urban fruit trees. Small animals, such as
1. Willett, W. et al. Lancet 393, 447–492 (2019).
and flour, with very little fish, meat, fruits or chickens, rabbits, and even termites and crick-
2. Poore, J. & Nemecek, T. Science 360, 987–992 (2018).
vegetables — would also be more environmen- ets, would add much-needed protein. The bal- 3. Hirvonen, K., Bai, Y., Headey, D. & Masters, W. A. Lancet
tally friendly, but Masters cautions that it is not ance of fruits, vegetables, grains and protein Glob. Health 8, e59–e66 (2020).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
Cooperate to
activities involved in producing, processing, distribut-
ing, preparing and consuming food, and the people who
influence those activities — in multiple ways. It is reducing
prevent food- food-production capacity, slowing distribution and lim-

iting access to both markets and financial or nutritional
system failure safety nets. Farmers are economically vulnerable owing

to the tight profit margins associated with their industry.
Government restrictions on the movement of people have
hindered farmers’ access to necessary goods, labour and
To keep people nourished equipment, slowed the planting and harvesting of crops
and affected feeding of livestock. Restrictions have also
during a global pandemic, our impeded the ability to move food to markets, ports and
food systems must evolve, across borders, leading to increases in food loss — particu-
larly of perishables, such as meat and dairy. Food waste has
and governments must work also increased as a result of reductions in the available work-
together, says Jessica Fanzo. force at meat-processing plants. Together with higher levels
A
of unemployment and loss of income, this has resulted in
an increase in the number of people struggling to access a
s journalist Joan Didion wrote in her 1967 essay healthy diet. Many people are already opting instead for
‘Goodbye to All That’, “It is easy to see the begin- staple grains and unhealthier, highly-processed foods that
nings of things, and harder to see the ends.” She was are cheaper and have longer shelf lives4. This pandemic,
writing about her love affair with the city of New and the need to stave off the next one, bolsters the already
York, but the same can be said of COVID-19. compelling case for ensuring the global food supply is safe,
How the pandemic began is reasonably well understood — nutritious and equitably distributed.
the virus, SARS-CoV-2, probably made its way from wild bats Governments and businesses should prioritize ensuring
CHRIS HARTLOVE
to humans through a food market. COVID-19 is the latest in a that producers are making healthy food and that consumers
long line of diseases that have crossed from animals to peo- have access to it. They should support and invest in food-as-
ple, including HIV/AIDS, severe acute respiratory syndrome sistance programmes during and after the pandemic. Gov-
and Ebola. In fact, 60% of emerging infectious diseases are “This ernments must support the United Nations’ $10-billion
zoonotic, and of the pathogens that cause these, at least 71% COVID-19 Global Humanitarian Response Plan, set up so its
originate in wildlife1. The reshaping of habitats around the
pandemic agencies can provide the most marginalized and vulnerable
world, often initiated by the need to grow more food, puts bolsters populations with basic services, such as COVID-19 testing
people in ever closer contact with wild animals and makes the already materials, medical equipment, food, water and basic health
the transmission of infections more likely. coverage, such as vaccines. As of September, the programme
How the pandemic will end, and what damage it will cause,
compelling had received less than 30% of its target.
is less clear. So far, there is no end in sight. Many people will case for The integrated One Health approach (addressing risks
be affected forever — economically, physically, socially and ensuring at the intersection of human, animal and environmental
psychologically. The World Bank estimates that up to 115 mil- health) is crucial in responding to COVID-19, recovering
the global
lion extra people will fall into extreme poverty (living on from it and preparing for the next zoonotic pandemic. To
less than US$1.90 per day) in 2020 owing to the economic food supply minimize viral reservoirs and contact between virus-carry-
shocks of the pandemic. This, in turn, will have significant is equitably ing animals and people, wildlife habitats must be protected
impacts on food security, nutrition and health. It is projected distributed.” against urbanization and deforestation. Governments need
that 130 million more people will face acute food insecurity to police the illegal sales of wildlife in food markets and
by the end of 2020, in addition to the estimated 135 million the global food trade, and to complement this with pub-
who faced it in 2019. lic-health disease-prevention programmes and messaging.
The health of those who are already undernourished could Stronger surveillance tools to track potential zoonotic and
decline further — particularly older, vulnerable and margin- food-borne illnesses across food systems are also needed.
alized people. Disruptions to health care in many low- and These recommendations to ensure food systems func-
middle-income countries owing to COVID-19 could lead tion effectively during the pandemic and long after cannot
to around 193,000 additional deaths among children per work without a united global effort. Instead of the splin-
month2. Obesity and non-communicable diseases are signif- tered responses to the COVID-19 crisis seen so far, involving
icant risk factors for hospitalization with COVID-19, and they political polarization and geopolitical competition, pol-
can result in medical complications for both young and older iticians must embrace global cooperation and inclusion.
people. Obesity and metabolic disorders are also factors in Jessica Fanzo Governments should not face inward. They should double
the disproportionate risks of hospitalization and death in is a food-system down on opportunities to re-engage and collaborate on the
low-income and ethnic minority populations in high-income researcher and interlinked challenges of climate change, malnutrition and
countries. In Chicago, Illinois, for example, nearly 70% of the nutritionist at Johns environmental collapse.
people who have died from COVID-19 were Black, although Hopkins University in
Black people make up only 30% of the population3. Baltimore, Maryland. 1. Cutler, S. J. et al. Emerg. Infect. Dis. 16, 1–7 (2020).
2. Roberton, T. et al. Lancet Global Health 8, e901–e908 (2020).
Early evidence suggests that the pandemic is trounc- e-mail: jfanzo1@jhu. 3. Yancy, C. W. J. Am. Med. Assoc. 323, 1891–1892 (2020).
ing the functionality and efficiency of food systems — the edu 4. Belén Ruiz-Roso, M. et al. Nutrients 12, 1807 (2020).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
Many agricultural researchers are now look-
ing to a set of practices known as sustainable
THE ‘PUSH–PULL’ FARMING SYSTEM: CLIMATE-SMART, SUSTAINABLE AGRICULTURE FOR AFRICA/ICIPE/GREEN INK LTD. UK
intensification. The specifics vary depending
on the setting, but a growing number of exam-
ples from around the world highlight the pos-
sibility of a second green revolution — one that
might better live up to its name.
Many roads to sustainability

The concept of sustainable intensification was
popularized2 in 1997 by Jules Pretty, an envi-
ronmental scientist at the University of Essex
in Colchester, UK. His goal was to challenge the
idea that increasing yield is inherently incom-
patible with environmental health, with an
agricultural philosophy that encompasses
parameters such as biodiversity and water
quality as well as the social and economic
welfare of farmers. Researchers have defined
the scope of sustainable intensification in dif-
ferent ways, but the big picture, says Pretty,
entails recognizing that agriculture is inexo-
rably connected with the environment and
designing cultivation strategies accordingly.
“Components of sustainable systems tend to be
multifunctional,” he says. “You want a diverse
system that provides support to pollinators,
A farmer inspects her maize crop, grown using a ‘push–pull’ approach. fixes nitrogen and provides a break against
insects.” Advocates of sustainable intensifica-
Natural solutions for

tion recognize that global agriculture can’t be
reinvented in one fell swoop and that progress
will come from incremental steps that improve
agricultural productivity
efficiency, as well as more-dramatic measures
that redesign the farming landscape.
Lucas Garibaldi, an agroecologist at the
National University of Río Negro in Bariloche,
Argentina, has focused on pollinators as a
Scientists are pursuing sustainability strategies for crucial component of what he calls ecologi-
cal intensification. “Crop yield depends not
intensifying production to tackle food security and only on the count of pollinators, but also on
environmental crises. By Michael Eisenstein the biodiversity of pollinators,” says Garibaldi.
O
“Millions of honeybees alone will not replace
the function of diverse species of wild bees
n paper, the global agriculture sector “Globally, we have to increase food production and butterflies and birds.” He notes that dif-
has done an admirable job of keep- by 60%, and in some areas we have to increase ferent bees pollinate different crops, but also
ing pace with a growing population. by 100%,” says P. V. Vara Prasad, a crop ecophys- allow more efficient pollination for some plant
According to the United Nations’ Food iologist at Kansas State University, Manhattan. species. To create a haven for these airborne
and Agriculture Organization, agricul- Over the past 50 years, producers increased assistants, Garibaldi advocates minimizing
tural output per person has increased by 50% agricultural output in much of the world pesticide use and including non-agricultural
since 1960 — impressive, considering the num- through the ‘green revolution’. But this revolu- zones in farmland. These could be wild-plant
ber of mouths to feed has more than doubled. tion has been environmentally harmful, relying borders that surround fields or just hedge-
But the reality is messier. Many people, heavily on chemical pesticides and fertilizers row-like strips of flowers that are appealing to
including those in high-income nations, lack that have inflicted lasting damage on the soil the bees that traverse them.
reliable access to nutritious food. And food and water supply. Natural biodiversity has Growing a mix of crops can have many bene-
security is an ongoing struggle for people in been sacrificed to create vast monoculture fits, including attracting pollinators. Conven-
poorer regions. Even transient disruptions can fields. And in many low-income nations, sur- tional monoculture leaves soil exposed for
have far-reaching consequences. One article1 vival depends on coaxing greater productivity much of the year, Garibaldi says. This creates
described the global food supply as being “on a from existing plots as more and more people opportunities for weeds to grow — necessi-
razor’s edge” — weather events or natural disas- scramble for limited resources, says Bernard tating herbicides — or leaves soil susceptible
ters in one part of the world can cause the price Vanlauwe, a soil scientist based in Nairobi at the to erosion. With multiple crops or rotation
of grain everywhere to spike by more than 50%. International Institute of Tropical Agriculture. throughout the year, more durable root

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
systems that densely and extensively perme- are concerned that the approach is untested
ate the ground can be established, reinforcing and unproven. Last year, Panjab Singh, presi-
the soil and preventing the nutrient depletion dent of the National Academy of Agricultural
associated with long-term monoculture. Sciences in Delhi, told the newspaper The
Diversity can also eliminate the need for Hindu, “We are worried about the impact on
pesticides. Pretty says around 180,000 farm- farmers’ income, as well as food security.”
CHRIS GOMERSALL/2020VISION/NATUREPL.COM
ers in Kenya, Uganda and Tanzania now use Smith concurs. “It was a political move, not a
push–pull cropping practices when growing scientific move,” she says, adding that the nat-
maize. They plant grasses around the edges of ural farming approach has “not been properly
maize plots that produce chemicals that ‘pull’ a trialled”. To assess the technique, she and her
common pest, the maize stalk borer (Busseola colleagues modelled the long-term impact
fusca), away from crops, while the maize itself of ZBNF on soil health. They found that the
attracts parasitic wasps that prey on the stalk approach could meaningfully and sustaina-
borer. The farmers also intersperse legumes bly improve nitrogen levels for low-yield lands,
of the genus Desmodium with the maize that but that it would offer little benefit to farms
enrich the soil with nitrogen, and produce already achieving high yields6. They concluded
compounds that ‘push’ away pests and kill off Crops rely on pollinators such as bees. that a more targeted implementation of ZBNF
a genus of invasive weed known as Striga. is needed to protect overall national food secu-
Sustainable soil management is a thorny forest management, pest management and rity. Smith remains largely positive about ZBNF,
issue, particularly in resource-limited settings. water,” says Pretty. By partnering with these which has been gaining momentum among
Vanlauwe notes that nutrient depletion is one groups, researchers can design programmes farmers. “There’s a lot of good things about it,
of the greatest threats to yield for African farm- that are more likely to be compatible with but it needs more science,” she says.
ers, making a hard-line approach to sustainabil- social, cultural and environmental conditions, Outside national initiatives, smallholder sus-
ity unrealistic. “People who say you can trigger and establish local networks of collaborators tainable intensive farming requires targeted
agricultural development in Africa without to facilitate the dissemination of information. investment and efforts to support social and
fertilizer do not have on the ground experi- Some governments are also taking a more economic stability. Vanlauwe contends that, in
ence,” he says. But there are environmentally active role. Ethiopia, for example, has focused many parts of sub-Saharan Africa, environmen-
friendly ways to feed the soil. Jo Smith, a soil on aspects of ecological repair by establishing tal and political conditions mean that many
scientist at the University of Aberdeen, UK, has ‘exclosure’ areas for depleted soils. “Areas are farmers will continue to struggle at the margins
been equipping farmers in Africa and Asia with fenced off, and after about ten years the land for the foreseeable future. Still, he sees a path
anaerobic digesters — simple systems that use starts to recover,” Smith says. towards economic mobility. “Give them access
microbes to convert animal manure into biogas In China, Fusuo Zhang, a plant-nutrition to credit they pay back over time, and invest in
for fuel and leave a nutrient-rich bioslurry. “It’s specialist at the China Agricultural University integration and value-chains so they can get rid
like giving them a little fertilizer factory — it in Beijing, and his colleagues are working with of or sell excess produce,” he says. “It’s about
gives you available ammonium that the crop government officials to mobilize an effort to creating incentives and access systems.”
can take up quickly,” she says. The biogas is also help smallholder farmers across the nation But durable change also requires building
less harmful than conventional fuels, reducing transition to more evidence-based, sustaina- local expertise in crop and soil research, and in
household air pollution and improving quality ble cultivation. This includes selecting seed ecosystems. Many specialists in these areas are
of life, Smith adds. varieties that are suited to a given plot, using also involved with international education and
Much of the world’s farming takes place modelling techniques to guide planting based training. For example, as director of the Feed
on smallholder plots. One study3 estimated on levels of sunlight, water and nutrients, and the Future Innovation Lab for Collaborative
that one-third of the global food supply is optimizing the timing and density of seed Research on Sustainable Intensification, Prasad
produced on farms of less than two hectares. planting. “We sent faculty members and groups has helped to coordinate undergraduate- and
This fragmentation can make it challenging of students to live among the farmers in the vil- graduate-level agriculture programmes in
to introduce sustainable intensification prac- lages, and work with them to try to change their places such as Senegal, Cambodia and Bang-
tices. “Smallholder production systems are management,” says Zhengxia Dou, an agricul- ladesh. Normally, these programmes take on
absolutely risk-averse,” says Vanlauwe. “Falling tural scientist at the University of Pennsylvania a few dozen students at a time, but the shift to
from earning US$100 to $50 a month can be in Philadelphia, who collaborated with Zhang’s online training as a result of the coronavirus
the difference between being not-hungry and team. By 2015, the effort had grown to include pandemic could prove to be a long-term gain
being hungry.” nearly 21 million farmers across China, who, for capacity building. “We are now talking to
Close collaboration with individual farmers on average, achieved a more than 10% boost about 500 or even 1,000 students,” he says.
is needed, but this is difficult to achieve at scale. in yield while using around 15% less fertilizer
Fortunately, smallholders are increasingly par- and reducing their greenhouse-gas output5. Michael Eisenstein is a science journalist in
ticipating in collectives that can accelerate Many farmers in India are embracing a Philadelphia, Pennsylvania.
information sharing and reduce the risk associ- national programme known as zero-budget
1. Cassman, K. G. & Grassini, P. Nature Sustain. 3, 262–268
ated with adopting new cultivation strategies. natural farming (ZBNF). This cultivation strat-
(2020).
In August4, Pretty and his colleagues reported egy involves using soil microbes and mulch 2. Pretty, J. M. Natural Res. Forum 21, 247–256 (1997).
that, worldwide, around 8 million such groups rather than synthetic fertilizers to enrich lands. 3. Ricciardi, V. Glob. Food Security 17, 64–72 (2018).
4. Pretty, J. et al. Glob. Sustain. 3, e23 (2020).
have formed over the past two decades. “That’s Farmers in several Indian states are pursuing
5. Cui, Z. et al. Nature 555, 363–366 (2018).
about 240 million people working in collec- the approach, including around half a million 6. Smith, J., Yeluripati, J., Smith, P. & Nayak, D. R. Nature
tive-action efforts around areas like irrigation, farmers in Andhra Pradesh. But some scientists Sustain. 3, 247–252 (2020).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
TOMMY TRENCHARD/PANOS PICTURES

Seaweed farmers in Tanzania tend to their crops. Not only is seaweed a nutritious food, but cultivating it can help to ease ocean acidification.
Cultivating a sea change

Can aquaculture overcome its sustainability challenges to
feed a growing global population? By Sarah DeWeerdt
O
n a summer morning in 2019, Andy Inside is more dark sediment — mostly beneath the mussel raft, as part of an effort to
Suhrbier pilots a small aluminium waste from the mussels, the source of the develop aquaculture in Puget Sound. The hefty
boat out to a mussel raft in a quiet smell. Suhrbier sifts through it. He is looking size of the cucumbers is a promising sign.
cove on the eastern shore of Puget for something. Suhrbier and his colleagues think that
Sound in Washington State. As the “Look at this monster!” he says, holding up sea-cucumber farming could have two ben-
boat approaches, a mother seal and her pup a sea cucumber nearly a foot long. Its deep efits. First, the animals could help to prevent
resting on the raft slip into the water. Suhrbier red body covered in orange bumps stands out excess waste from building up underneath
climbs from his boat onto the raft; the only from the muck like a gold doubloon. “That’s aquaculture installations, such as mussel
sign of life is a vague smell. definitely market size.” rafts or net pens used to hold bony fish such
Suhrbier tugs on a couple of ropes attached Suhrbier is a biologist with the Pacific as salmon. (Sea cucumbers, soft-bodied ani-
to one of the raft’s beams. Soon, a mesh- Shellfish Institute in Olympia, Washington, a mals related to sea urchins, move slowly over
lined plastic cage emerges with water and non-profit research organization that works the sea floor eating detritus — the vacuum
silt pouring out of it. He picks off several sea to promote healthy wild shellfish populations cleaners of the ocean.) Second, a ready source
stars and tosses them back into the water, and sustainable shellfish aquaculture along of farmed sea cucumbers could reduce the
then flips open the lid like a pirate opening the US west coast. Two years earlier, he had put poaching of wild stocks to feed the growing
a treasure chest. sea cucumbers in cages and suspended them market in east and southeast Asia.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
Globally, aquaculture produced 82.1 million ocean acidification at a local scale. More- less than a kilo of wild-caught fish,” Nichols
tonnes of aquatic animals in 2018, and wild over, “seaweeds are incredibly nutritious,” says. “So our farming practices resulted in the
fisheries produced 97 million tonnes, accord- says Alecia Bellgrove, head of the Deakin- net production of fish on the planet.”
ing to the United Nations’ Food and Agricul- Seaweed Research Group at Deakin Univer- Other companies soon joined in, producing
ture Organization (FAO). But the value of sity in Melbourne, Australia. “They are, for omega-3s in genetically engineered canola oil
farmed fish was higher, around US$250 billion example, fantastic sources of trace minerals, or single-celled algae. Meanwhile, fish-oil and
compared with $151 billion for wild-caught which are often lacking in our diets based on fish-meal producers are increasingly making
fish. Aquaculture production of animals is terrestrial foods.” use of fish trimmings and other by-products
projected to increase by one-third by 2030, Aquatic animals that require feed — mainly that previously went to waste. Fish meal and
reaching 109 million tonnes, and will supply prawns and bony fish — also have an environ- fish oil, which are still used in a variety of aqua-
the majority of aquatic protein in people’s mental advantage over animals raised in culture feeds, as well as in products such as
diets by 2050. terrestrial agriculture. Because most are cold- food supplements, accounted for around 10%
“We need to grow the amount of seafood blooded, they convert food into body mass of the world’s total fish production in 2018,
available, as world populations grow, to pro- more efficiently than birds and mammals, according to the FAO. But nonetheless, Nich-
vide enough protein for everybody,” says which need energy to help regulate their body ols takes heart from developments. “What
Monica Jain, founder of Fish2.0 in Carmel, temperature. So it takes less feed to produce a looked on the face of it to be dismal in 2006
California, an organization that promotes kilogram of salmon, for example, than it does now looks to be very promising,” he says.
investment in sustainable seafood businesses. to produce a kilogram of, say, beef or pork.
With the catches from wild fisheries remain- However, some of the most lucrative aqua- Disease detectives
ing largely flat and some stocks already over- culture species are carnivorous, and therefore An increasingly important threat to aquacul-
exploited, “aquaculture is really the only way sit higher in the food chain than any terres- ture sustainability is disease, which affects all
to do that”. But as the industry grows, Jain and trial species raised in agriculture. Take the subsectors of aquaculture and causes an esti-
other aquaculture advocates want to make Atlantic salmon (Salmo salar), for example. mated $6 billion worth of aquatic animal losses
sure that it does so sustainably. In the mid-2000s, salmon aquaculture, now every year. Diseases include parasites called
a $15.4-billion industry, was growing rapidly. sea lice in salmon; white spot syndrome virus
Double alchemy Feeding the salmon demanded an increasing in prawns, which emerged in the early 1990s
Aquaculture is a relatively small proportion share of the world’s fish meal and fish oil, which and devastated prawn farming throughout
of the global food system — terrestrial meat Asia before spreading to the Americas; and
production (both livestock and wild game) “Our farming practices tilapia lake virus, which threatens the eco-
totalled around 342 million tonnes in 2018, and nomic and nutritional gains that freshwater
production of grains and cereals was 2.7 billion
resulted in the net aquaculture has made possible in many low-
tonnes. However, aquaculture is more diverse, production of fish on and middle-income countries.
particularly in terms of the animals farmed. the planet.” As aquaculture is scaled up, the problem
These range widely across taxonomic groups, of disease will also become greater. “As you
including bony fish (carp, tilapia and salmon, expand the volume of production, you are
for example), crustaceans (shrimp, prawns and was sourced from small forage fish, such as going to get significant losses,” says Grant
crayfish), molluscs (clams, oysters and mus- anchovies, sardines and capelin. But while Stentiford, a pathologist and head of aquatic
sels) and echinoderms (sea cucumbers). Vari- demand from the salmon farms grew, fishing animal health at the Centre for Environment
ous species of seaweed are also gathered. There yield for the forage species remained rela- Fisheries and Aquaculture Science, Weymouth,
are freshwater, saltwater, brackish water and tively flat. It took at least 4 kilograms of wild- UK. “You’ve used up potentially large amounts
self-contained terrestrial aquaculture systems. caught forage fish to produce just 1 kilogram of resource to get absolutely nowhere.”
And each has its own sustainability benefits of salmon. To deal with such threats, some large pro-
and challenges. From an environmental point of view, “It ducers who supply the export market are
One subsector that offers huge environmen- made no sense,” says Scott Nichols, founder moving to self-contained, land-based systems.
tal advantages and has no equivalent in terres- of Food’s Future, a consultancy in West Others are moving away from the coast into
trial agriculture is non-fed aquaculture. Marine Chester, Pennsylvannia, which promotes deeper waters that might dilute the threat of
bivalves, such as clams, mussels and oysters, the development of sustainable aquaculture disease. Vaccines have also made a difference,
get their nourishment by filtering microscopic businesses. As a biochemist working at US reducing not only the threat of many fish dis-
plants, detritus and nutrients from the water chemical company DuPont in the mid 2000s, eases, but also antibiotic use — another major
that surrounds them. They require minimal Nichols helped to develop a way to produce environmental concern about the industry.
inputs and can even improve the water qual- omega-3 fatty acids from yeast. The fatty acids And high-throughput sequencing of the
ity. In this sense, Suhrbier’s sea cucumbers were then incorporated into salmon feed to microbial DNA in aquaculture systems could
represent a kind of double alchemy: non-fed replace some of the wild-fish component. The provide early warning of disease outbreaks.
aquaculture species grown on the wastes of new feed was tested through a partnership But many of these solutions are expensive
other non-fed aquaculture species. between DuPont and the aquaculture firm and, therefore, out of reach for the small and
Similarly, cultivated seaweeds can remove AquaChile based in Puerto Montt, Chile, in medium-sized producers who make up the
excess nutrients, such as nitrogen, that the form of the salmon producer Verlasso in majority of the global aquaculture industry,
contribute to the formation of areas of oxy- Miami, Florida. producing food for subsistence or local mar-
gen-poor water where marine life has dif- “We were able, after a couple of years of pro- kets in low- and middle-income countries.
ficulty surviving, known as dead zones. By duction, to get to the point where for every kilo Moreover, diseases that threaten aquacul-
taking up carbon they can also help to alleviate of salmon that was produced, we were using ture are emerging every three to five years

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
This type of rice–fish system has been
practised for hundreds of years in China and
has been designated a Globally Important
Agricultural Heritage System by the FAO — a
designation that aims to preserve agricul-
tural knowledge that can contribute to a
more sustainable and resilient food system.
Large-scale aquaculture operations, such as
Cooke Aquaculture based in Blacks Harbour,
Canada, have also been experimenting with
multi-species systems. The company keeps
salmon in net pens near both mussel and kelp
rafts in the Bay of Fundy, Canada.
In theory, integrated aquaculture can help
to increase yields, decrease risk by diver-
sifying operations, and is generally a more
environmentally sound form of aquaculture.
SARAH DEWEERDT
But in practice, it can be difficult to quantify
these benefits. For example, because nitro-
gen moves freely through water, it is difficult
to track uptake of excess nitrogen produced
Sea cucumbers are retrieved from the mesh-lined cages at Puget Sound in Washington state. by bony fish by seaweed growing nearby. And
then there are the complexities of managing
on average. The dearth of knowledge about crops or infrastructure in the water and on an operation with multiple species — not just
aquatic pathogens makes diseases hard to land. “From an environmental point of view I producing them but also harvesting, process-
predict and spot. think climate change is the greatest challenge” ing and marketing them.
It can also be a challenge to deduce their for the sustainability of aquaculture systems, Suhrbier knows such difficulties well. The
cause. For example, ice-ice disease results in says Nesar Ahmed, who studies global seafood sea cucumbers he and his team harvested from
bleaching of Kappaphycus seaweed, which is sustainability at Deakin University. under the mussel raft were the right size, weight
grown in large amounts in southeast Asia and Climate change also intersects with aquacul- and colour for the export market, but the mus-
Tanzania for the production of food additives, ture’s pressure on water and land resources. sel producer he was working with was unable to
such as the thickening agent carrageenan. The Inland aquaculture demands 429 cubic kilo- renew its permit at that location. The raft was
disease has caused yields to plummet over metres of fresh water each year — much less lost, and with it Suhrbier’s chance of follow-up
the past decade, but “the causative agent is than the demand from terrestrial agriculture, experiments to develop sea-cucumber aqua-
still not known”, says Valéria Montalescot, but still enough to pose a strain on increasingly culture techniques. “I was really shocked and
senior project manager for GlobalSeaweed- drought-prone areas. saddened to see that go because it was one of
STAR, a four-year research project based at In south and southeast Asia, prawn cultiva- those places where it just makes a lot of sense
the Scottish Association for Marine Science tion has contributed to the destruction of 38% for sea cucumbers to be,” Suhrbier says. The
in Oban, UK, which aims to boost knowledge of the world’s mangrove habitats, which have new location of the producer’s rafts isn’t a good
about seaweed cultivation in low- and mid- a variety of important ecological functions, habitat for sea cucumbers.
dle-income countries. Kappaphycus is usually including sequestering carbon and buffering Suhrbier is still experimenting growing sea
grown from cuttings, so the whole crop across coastlines from storms and sea-level rise. The cucumbers alongside other types of aqua-
multiple countries might be the result of just a loss of mangroves has also resulted in saltwater culture operation around the Puget Sound
few clones, possibly making it more vulnerable intrusion rendering inland areas unsuitable for area. But, like an increasing number of aqua-
to disease, Montalescot adds. terrestrial agriculture. culture researchers, he is beginning to think
Some farmers are now producing prawns that producing the animals needs to move in
Diverse yields among intact mangrove stands. Although there a simpler and more radical direction. Grow-
Climate change is complicating efforts to fight are concerns that this practice might also dam- ing sea cucumbers in cages is labour intensive.
disease. Higher water temperatures can alter age the health of the mangroves, it is part of a What if the animals are placed in the vicinity of
the microbial community of a body of water, larger trend to create aquaculture systems that aquaculture operations and left to roam freely
encouraging the growth of pathogens, as well include multiple species and involve interrela- — like a marine equivalent of a ranch or even a
as stressing organisms and making them more tionships more like the ones that keep natural permaculture system?
vulnerable to disease. One suggested cause ecosystems in balance. “If we could mainly enhance the wild popu-
of ice-ice disease is that temperature-stressed Some examples of this integrated aquacul- lation around these areas, I think that would be
seaweeds release compounds that attract ture are long-established, such as stocking rice a great benefit for everybody,” Suhrbier says.
bacteria, for example. paddy fields with fish or prawns. The animals “I’m trying to have something that fits in: easy,
And temperature is not the only issue. eat pests and fertilize the rice crop, increasing cost effective and as passive as it can be.”
Both increased rainfall and salinity intrusion rice yields and providing an extra source of pro-
from sea-level rise can alter water chemistry tein or income for small-scale farmers, Ahmed Sarah DeWeerdt is a freelance writer in
in ways that are detrimental to aquaculture says. Growing two species in a single body of Seattle covering biology, medicine and the
organisms. Storms can destroy aquaculture water also reduces overall water use. environment.

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
Eating for better mental health

Food, mood and brain function are thought to be connected through the
community of microorganisms that live in the gut. By Clare Watson
I
f you want to do right by the planet, the adequate quantities, confer health benefits on them all, people need to eat a well-rounded diet.
general advice is to eat an abundance of the host — have had compelling results, such as On the question of what mix of microbes
fruit and vegetables, as well as whole grains reduced symptoms of depression in women makes for a healthy microbiome, diversity
and nuts, and to consume less meat, dairy who have recently given birth. But others have is thought to be important. People with a
and processed foods. Add some high-fibre shown no more effect than a placebo. Compar- greater variety of bacteria in their gut seem
fare, fermented food and fish a few times a isons are difficult owing to the varying doses to be healthier. A Western diet of processed
week, and you could be eating your way to and strains of bacteria used in trials. Similarly, foods that is high in fat and sugar and low in
better mental health, too. although promising, the evidence from human dietary fibre and micronutrients seems to
Those are the recommendations from nutri- trials testing specific prebiotic foods — those be detrimental to both the gut and the mind,
tional psychiatry research. The field is built that are rich in high-starch dietary fibres that reducing the diversity of the gut microbiome,
around growing evidence from population stimulate beneficial bacterial colonies in the increasing inflammation and elevating the risk
studies and clinical trials in the past decade gut — is insufficient to draw clear conclusions. of depression.
suggesting that dietary improvements might At the Food and Mood Centre, researchers The long-running SUN (Seguimiento Univer-
not only improve mood but also treat com- focus on the entire diet, rather than individ- sity of Navarra) cohort study has been recruit-
mon and severe mental illnesses. Traditional ual ingredients or certain strains of bacteria ing university graduates in Spain since 2000
diets consumed by people in places such as the delivered in probiotic supplements. “There to analyse associations between dietary pat-
Mediterranean, Norway and Japan are associ- terns and health, including depression. One
ated with a lower risk of depression — one of of its findings is that the more ultra-processed
the most common mood disorders — and, to foods (usually, energy-dense foods that are
a lesser extent, anxiety. A change in diet can significantly altered from their original state)
alleviate symptoms of depression, even among you eat, the greater your risk of depression2.
people with severe forms of the condition. Over the past decade, observational studies
Researchers have also been investigating of this kind have consistently shown that
how the trillions of microorganisms in the well-rounded diets that are low in ultra-pro-
human gut communicate with the brain to cessed foods confer some protection against
influence the processes that take place there. depression. But public-health researcher
Imbalances in the gut microbiome have been Almudena Sánchez Villegas at the University
linked to a range of neurological disorders, of Las Palmas of Gran Canaria, Spain, says more
including Alzheimer’s disease, autism spec- randomized controlled trials testing specific
trum disorder, multiple sclerosis, Parkinson’s dietary interventions are needed. This will
SCIEPRO/SPL
disease and Huntington’s disease. allow researchers to refine the International

Brain function, mood and mental health Society for Nutritional Psychiatry’s existing
seem to be intricately linked to what peo- dietary guidelines3 for preventing depres-
ple eat through the gut–brain axis, which is Illustration of bacteria in the intestine. sion, of which Sánchez Villegas is a co-author.
modulated by the gut microbiome. Because individual responses to food seem to
That there is a connection between the are no superfoods that are going to guarantee be mediated by a person’s gut microbiome,
food people eat and the brain is “something positive mental health,” says Loughman. “It’s researchers will need larger trials to work out
that ancient wisdom has been telling us for a both simpler and more complex than that.” who is likely to benefit from dietary interven-
long time”, says Amy Loughman, a gut-micro- To support mental and brain health, tions before these can be prescribed in the
biome researcher at Deakin University’s Food high-fibre foods such as fruit, vegetables and clinic. Until then, the public-health message
and Mood Centre in Geelong, Australia. “But whole grains are recommended, Loughman is simple: everyone stands to benefit from a
it’s really exciting that science is beginning to says. In these foods, the fibre and starches that balanced diet with plenty of fruit, vegetables
prove how and why.” are resistant to digestion in the small intestine and fibre, and few processed foods. “That’s
The gut microbiome is proving to be promote the growth of beneficial bacteria, universally good for people,” says Loughman.
extremely complex and sensitive. Changes which guard against inflammation by protect-
in diet can alter its composition in days. But ing the gut wall. The bacteria also produce the Clare Watson is a freelance science journalist
the data from human trials on dietary inter- short-chain fatty acids that are thought to have in Wollongong, Australia.
ventions designed to shift gut microbiota to a key role in crosstalk between the gut and the
1. Vaghef-Mehrabanya, E. et al. Clin. Nutr. 39, 1395–1410
improve mental health are mixed, according brain. Chemicals such as polyphenols in plant-
(2020).
to one meta-analysis1. Some studies of probi- based foods and omega-3 fatty acids in fish have 2. Gómez-Donoso, C. et al. Eur. J. Nutr. 59, 1093–1103 (2020).
otics — live microbes that, when introduced in reported mental-health benefits, too. To get 3. Opie, R. S. et al. Nutr. Neurosci. 20, 161–171 (2017).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
SOPHIE CASSON FOR NATURE

Cell-based meat with a side of science
Growing a burger in a laboratory is one challenge, growing an
industry to do it at scale is quite another. By Elie Dolgin
W
hen Laura Domigan started her efforts on creating artificial corneas for eye in October 2020, a team led by Domigan won
research group at the University surgery — a far cry from anything resembling a multi-million dollar grant from the New
of Auckland, New Zealand, in 2015, a lab-grown steak. Zealand and Singaporean governments to
she hoped to continue her work Still, she never gave up on her dream of stud- explore questions such as which cells are the
developing protocols for growing ying in vitro meat. “I had to be super patient best starting material for cultured meat, and
cell-based meat in the laboratory. But with and keep trying,” Domigan says. And although is the nutritional profile of meat grown in a lab
funding for cultivated-meat research prac- it took several years, Domigan’s strategy even- equivalent to the real thing. “There is so much
tically non-existent in academia at the time, tually paid off. research that needs to be done,” Domigan says.
Domigan pivoted to working on biomedical Initially, she secured funding for a PhD And much of it is only beginning to happen, at
materials for use in tissue engineering. A pro- student to begin developing formulations of least in any sort of transparent way.
tein biochemist by training, she focused her nutrient media to grow cell-based meat. Then, Investors have poured hundreds of millions

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
of dollars into cultured-meat research in the Technical University of Munich in Germany, for in the subject grows and grant applications
past few years, bringing hype and breathless example, is accepting applications for a profes- increase, governments have begun to inject
news coverage about an agricultural revolution sorship in cellular agriculture. And, as evidenced more cash into the field. Several large grants
that could bypass the environmental and ani- by Domigan’s funding success, governments too have been issued in the past few months alone.
mal-welfare issues of conventional meat pro- are heeding the call for financial support. In November, for example, the government
duction. One estimate by the consulting firm The field of cellular agriculture is beginning agency Flanders Innovation and Entrepre-
Kearney in Chicago, Illinois, suggests that 35% to take on some of its biggest scientific and neurship began funding a €2.1-million, 4-year
of all meat consumed globally by 2040 will be engineering challenges, and scientists from project called CUSTOMEAT, run by scientists
cultured — a change that is projected to reduce a range of backgrounds are entering the fray. at Ghent University and KU Leuven, both in
greenhouse-gas emissions and antibiotic use. “This cellular-agriculture research is Belgium. In the United States, the National Sci-
And thanks to the COVID-19 pandemic, which the stuff that gets me up in the morning,” ence Foundation (NSF) awarded around $3.5
revealed crucial weaknesses in global food-sup- says Glenn Gaudette, a biomedical engi- million in September to back a cultivated-meat
ply chains, some people now expect the transi- neer at Worcester Polytechnic Institute in consortium at the University of California,
tion to cell-based meat to happen even faster. Massachusetts. For almost 20 years he has Davis, for the next five years.
Earlier this month, a US start-up called Eat studied scaffold technologies for heart-re- “Our hope is that we can provide basic
Just announced that its chicken bites — which generation therapies; now he is applying his knowledge and build a trained workforce,”
are 70% cultured chicken cells, with plant pro- expertise to the problem of how to grow meat. says chemical engineer David Block, who is
tein added for structure and flavour — had won “Does it pay the bills? No, not yet — hopefully, leading the Davis effort. “Those are the kinds
regulatory approval for sale to consumers in one day — but it’s really exciting.” of things that you need to grow an industry.”
Singapore, a global first for the cultivated-meat Experts say that many cultivated-meat
industry. The fight for funding companies will probably over-promise and
Scientists worry that the commercial push In the early 2000s, the US space agency NASA under-deliver. But academic science can help
to bring palatable products to market means briefly supported efforts to grow goldfish “keep credibility alive,” says Johannes le Cou-
that fundamental studies are either not hap- muscle in the lab as a potential source of pro- tre, who led a research group at the Swiss food
pening or remain cloaked in trade secrecy. tein for astronauts on long missions. A few giant Nestlé before joining the University of
Start-ups have made splashy demonstrations years later, the Dutch government sponsored New South Wales in Sydney, Australia, in 2019
of their lab-grown chicken nuggets, pork sau- a €2-million ($2.3-million) project to cultivate to run a lab dedicated to cellular agriculture.
sages, steak strips and seafood dumplings. But Amy Rowat, a biophysicist at the Univer-
these show only that companies “can do this “This cellular-agriculture sity of California, Los Angeles, notes that aca-
on a small level”, says Abhi Kumar, a venture demia also offers the intellectual freedom for
partner at Lever VC, a New York-based venture
research is the stuff that gets researchers to work on exploratory projects,
capital fund that focuses on alternative-pro- me up in the morning.” using expertise in basic science to come up with
tein start-ups. The challenge now, he says, is innovative ways of thinking, or to tackle ques-
making it work at scale. tions not directly related to product develop-
Improvements in cell source material and pork meat from stem cells — research that, ment but still significant to the overall domain.
the nutrient media required to fuel cell growth with an extra €250,000 infusion from Google And according to David Kaplan, a bioengineer at
are needed, as are scaffolds to support 3D tis- co-founder Sergey Brin, eventually led to the Tufts University in Medford, Massachusetts, the
sue structures. Next-generation bioreactor field’s highest-profile moment so far, when next generation of scientists entering the field
platforms that can grow huge numbers of vascular biologist Mark Post at Maastricht are “totally motivated to make a difference”.
cells at high densities are also a must. These University in the Netherlands, unveiled the “I have never seen such driven, passionate
are costly undertakings — so much so that world’s first cultured burger, in 2013. students in all my decades of doing this,” he
many in the field are dubious that private But aside from intermittent funding oppor- says. Andrew Stout is a case in point. A PhD
financing can support them, and still yield an tunities to explore the social ramifications student in Kaplan’s lab, Stout is rethinking the
affordable product. of producing meat from cell cultures, there entire process of cellular agriculture, start-
That’s why leading thinkers in cellular agri- have been few other public grants for culti- ing with the most basic ingredient: the muscle
culture, such as Erin Rees Clayton at the Good vated-meat research. Government bodies cells themselves.
Food Institute (GFI), argue in favour of more stayed away from the field in large part, says Most companies growing lab meat either
open science and public investment. “There’s a Kate Krueger, former research director at the use cells taken directly from animal biopsies
need for public-sector research in these areas,” non-profit organization New Harvest in Cam- or cell lines that have spontaneously become
says Rees Clayton, who is associate director of bridge, Massachusetts, because the science immortal through natural mutations that
science and technology at the GFI, a non-profit was unproven and crossed disciplines in ways allow indefinite proliferation in the lab. Few
think tank in Washington DC. “There’s a lot of that defied the conventional dividing lines of firms will consider genetically manipulating
room in the pre-competitive space for more funding-agency bureaucracy. the cells for optimal performance because of
work to be done in an open way, so we can all “Cellular agriculture falls into this funding a fear of consumer backlash. But Stout real-
benefit from it and move forward more quickly.” no-man’s land between biomedical research ized that genetic engineering offered a path
To fill the funding gap, the GFI created a and agricultural research,” says Krueger, who to achieving the nutritional promise of cul-
research-grant programme that has given out now runs a consulting firm, also in Cambridge, tivated meat.
close to US$3 million over the past 2 years to centred around cellular agriculture.
16 research teams working on cultivated-meat Money from the GFI and New Harvest has Starting material
projects. Academic institutions are beginning plugged the funding gap to some extent. But He inserted three genes into cow muscle
to hire with the nascent discipline in mind — the the situation is changing. As scientific interest cells1. Each encoded an enzyme involved in

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
in vitro culture media, creating the slaugh-
ter-free products that the industry demands.
But this serum-free media is too expensive
for cultivated meat to be affordable on the
supermarket shelves. “It’s difficult to find a
cost-efficient option,” says Ka Yi Ling, chief
scientific officer at Shiok Meats, a cell-based
seafood company in Singapore.
Media matters
According to an analysis by the GFI4, growth
media currently make up the bulk of total
production costs for cultivated meat, and
proteins known as growth factors are the
most expensive ingredient. Costs are com-
ing down, as start-ups dedicated to serving
the cellular-agriculture industry devise ways
DAVID PARRY/PA IMAGES/ALAMY

to manufacture these products. But as Matt
Anderson-Baron, co-founder and chief sci-
entific officer at one such company, Future
Fields in Edmonton, Canada, concedes,
“There’s still so much to be done on the opti-
mization and discovery front.”
Assuming researchers find the right cell line
Producing products such as the lab-grown meat burger at scale is still a long way from reality. and growth medium combination, they then
have to grow those cells on a scaffold — ideally
the synthesis of an antioxidant that mitigates executive director of Clean Research, a non- one that is edible so that it doesn’t have to be
diseases associated with consuming red and profit organization in New York, now wants removed from the final product. For ground-
processed meats, such as colon cancer. The to make it the go-to species for basic-research meat products, such as burgers and sausages,
enzymes might also help with the manufac- projects in cellular agriculture as well. “There’s small beads known as microcarriers can pro-
ture of cultured meat, because the unstable a lot of fundamental understanding that’s not vide the surface properties needed by most
molecules that antioxidants attack reduce there yet,” he says. “We need the participation muscle and fat cells for growth. But for any-
the proliferation of some lab-grown cells. If of a lot of people to just think freely through thing with a more complex meat structure — a
consumers are willing to accept these types the science together.” steak or an Iberian ham, for example — a more
of DNA enhancement, “genetic and metabolic And, as Rostain and his colleagues have sophisticated tissue-engineering approach
engineering can offer a lot of impact and ben- described3, researchers can benefit from the is required.
efit to cultured meat”, Stout says, “and could extensive molecular toolkit already estab- One option comes from a team at Harvard
even allow us to create novel foods that we lished for zebrafish. Plus, as a lean fish with University in Cambridge, Massachusetts. Bio-
couldn’t get any other way”. little fat content in the muscle, zebrafish fillets engineer Kevin Kit Parker and his colleagues
Other researchers are reconsidering should be easier to produce than compara- have developed a spinning technique that
whether the cells that go into cultivated-meat ble lab-grown cuts of fat-laced salmon, tuna, works like a candy-floss machine to extrude
products need to come from species that are long, thin fibres from gelatin5. The researchers
already commonly consumed in Western cul- “I have never seen such put the gelatin, a protein product derived from
tures. For example, Natalie Rubio, another collagen, into their machine and produced tiny
Tufts graduate student, has explored grow-
driven, passionate threads — narrower than the width of a hair —
ing meat from insect cells to create products students in all my that closely matched the architecture of fibres
that can be designed to taste like crab, prawns decades of doing this.” found in muscle tissue.
and other seafood. Using muscle cells from Last year, Parker and his colleagues showed
fruit flies (Drosophila melanogaster)2 and that rabbit and cow muscle cells grown on the
the caterpillar of the moth Manduca sexta, beef or pork. Cultivated zebrafish will proba- fibrous gelatin line up with the proper ori-
Rubio showed that insect cells are easier and bly taste similar to white fish, such as cod or entation6. The cells were still not as densely
cheaper to grow than cells from conventional haddock, Rostain says, and discoveries made packed as real muscle, but Parker, together
livestock species, and might also have nutri- with zebrafish should translate to any other with three of his postdocs and students, has
tional advantages. edible species. since created a company called Boston Meats
Meanwhile, other scientists hope to rally Regardless of the starting material, all cells to improve the technology further. “Now, with
the research community around the idea of will require an optimized growth medium our scaffolds,” he says, “you can move from
cell-based zebrafish fillets, at least as a vehi- — the rich broth of chemicals and proteins hamburger to fillet.”
cle for accelerating advances in the field. The needed to support proliferation and differ- Elsewhere, researchers have made scaffolds
zebrafish (Danio rerio) is an established model entiation. Companies have already devised out of foods such as textured soya protein and
organism for studying the genetic, neuronal ways to eliminate the nutrient-rich fetal various vegetables stripped of their cellular
and behavioural basis of disease. Alain Rostain, bovine blood that is the cornerstone of most material so that only supporting structural

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
sugar molecules and proteins remain. At the
University of Ottawa in Canada, for example,
biophysicist Andrew Pelling and his students
have taken decellularized stalks of celery and
shown that the grooves created by its natural
structure help to promote the patterning and
alignment of muscle cells7. And at Worcester
Polytechnic, Gaudette’s team has grown fat
and muscle cells on cell-free spinach leaves
— the plant’s branching network of delicate
veins provide ideal conduits for the nutrient
medium to reach every cultivated meat cell.
Because muscle and fat cells require dif-
ferent growth media, however, researchers
typically culture the two cell types separately,
each on its own scaffold in a different nutrient
bath. Some researchers, including Rowat, have
devised strategies to then interlace the muscle
and fat to achieve the flavour of a well-marbled
steak. “The cells actually fuse together with
the other partnering scaffold type on the time
scale of hours to form these composite struc-
tures,” Rowat explains. In unpublished work,
she has created miniature marbled steaks out Ka Yi Ling and Sandhya Sriram are co-founders of Shiok Meats in Singapore.
of mouse and rabbit cells, and has begun to
work with cells from pigs and cows, as well. annually, something that is not possible with cells at scale. “You have these two competing
But even with the latest scaffolding strate- the types of batch bioprocessing techniques interests,” says Kahan. To support nutrient
gies, some muscle biologists worry that key currently used in mammalian-cell-based and gas exchange, “you’re trying to keep
aspects of tissue physiology are still being dis- manufacturing. Global capacity could fulfil things well mixed while at same time trying to
counted. “This incredible focus that we have as about one-billionth of that requirement. “It’s subject the cells to very little mechanical fluid
an industry on cell division ignores fusion and a massive limiting factor,” says Connon, who stress”. With input from industry partners, the
maturation,” says James Ryall, chief scientific has developed a type of continuous cell-bio- consortium plans to build more complexity
officer at Vow, a start-up in Sydney working reactor platform that he plans to commer- into its models to inform the design of biore-
on cell-based meat from animals such as kan- cialize through a spin-off company called actors in the real world.
garoo and alpaca. Block’s group, with its large NSF grant,
To form muscle tissue, thousands of individ- “This incredible focus that isn’t even attempting to work on bioreactor
ual precursor cells must first fuse together to designs; there is plenty to keep his team busy
become long myotubes. These cells require
we have as an industry on tackling the issues of cell lines, media and scaf-
physical stimuli to mature into myofibres. cell division ignores fusion folds, as well as conducting feasibility assess-
Only then will muscle grown in the laboratory and maturation.” ments of the cell-based meat industry. “To me,”
have the texture and nutritional properties of says Block, “it’s not clear yet that this is going
real meat, says Lieven Thorrez, a muscle-tissue to be a viable alternative” — whether from a
engineer at the Kortrijk campus of KU Leuven. CellulaREvolution. technical, economic or sustainable stand-
“And that is a process that takes time. You can- Meanwhile, computer scientist Simon point. But with each new grant or research
not just differentiate cells over a period of a few Kahan, president of life-sciences software team entering the field, the goal of a perfectly
days and say the myofibres will be the same as company Biocellion SPC in Seattle, is lead- grilled, medium-rare steak grown from cells
those found in an adult animal,” he says. “That ing a team called the Cultivated Meat Mod- gets a little closer to becoming a reality.
is largely overlooked.” eling Consortium that formed in 2019 with
the aim of optimizing bioprocessing tech- Elie Dolgin is a science journalist in
Scaling up niques through modelling techniques. With Somerville, Massachusetts.
With so many scientific issues to resolve, cell- funding from the German technology com-
based meat research, whether in academic pany Merck, the consortium has developed 1. Stout, A. J., Mirliani, A. B., Soule-Albridge, E. L., Cohen, J.
M. & Kaplan, D. L. Metab. Eng. 62, 126–137 (2020).
or private labs, remains at the experimental a proof-of-concept model of a stirred-tank 2. Rubio, N. R., Fish, K. D., Trimmer, B. A. & Kaplan, D. L. ACS
stage. For commercial viability, the industry bioreactor, involving little more than a spin- Biomater. Sci. Eng. 5, 1071–1082 (2019).
will need to find ways to produce tissue at a ning rotor and free-floating minute beads for 3. Potter, G. et al. One Earth 3, 54–64 (2020).
4. Specht, L. An Analysis of Culture Medium Costs and
massive, and unprecedented, scale. growing muscle cells. Production Volumes for Cultivated Meat (The Good Food
Tissue engineer Che Connon at Newcas- The bioreactor might be fairly rudimentary, Institute, 2020).
tle University, UK, estimates that feeding but the consortium’s computer modelling is 5. Reza Badrossamay, M., McIlwee, H. A., Goss, J. A. &
Parker, K. K. Nano Lett. 10, 2257–2261 (2010).
the world’s population with lab-grown meat anything but. The simulations of fluid dynam-
6. MacQueen, L. A. et al. npj Sci. Food 3, 20 (2019).
would necessitate building systems for grow- ics and cellular biomechanics have revealed 7. Campuzano, S., Mogilever, N. B. & Pelling, A. E. Preprint at
ing on the order of a septillion (1024) cells a central challenge to growing muscle or fat bioRxiv https://doi.org/10.1101/2020.02.23.958686 (2020).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
Research round-up
Highlights from
sustainable-
nutrition research.
By Dyani Lewis
Farming trends
deplete pollinators
Most cultivated crops depend
on insect pollinators, such as
bees, but global crop trends are
leaving pollinators worse off.
Using data from the United
Nations’ Food and Agriculture
Organization, an international
team, led by Marcelo Aizen
at the National University
of Comahue in Rio Negro,
FOTOKOSTIC/ISTOCK/GETTY
Argentina, assessed changes
in the amount of land used for
agriculture and the types of
crops cultivated between 1961
and 2016. During that time, the
area of land used to grow crops
increased by around 40%, and Rapeseed crops depend on pollinators such as bees.
pollinator-dependent cropland
more than doubled. Soya bean, problems for food security. difficult. Comprehensive data the people living in each
rapeseed and oil palm — crops Poorer regions will be the on how much food ends up household from attributes
associated with deforestation hardest hit by crop failures, but in the bin does not exist. But such as height, weight, age
and diversity loss — account higher-income countries that Yang Yu and Edward Jaenicke at and gender. The amount of
for much of the expansion and rely on imported food will also Pennsylvania State University food waste was estimated
for the increase in pollinator be affected. in University Park used a new according to the difference
dependence. Rotating a diverse range of method to overcome the lack between the household’s food
But although the land used crops on a single piece of land of data. inputs and its members’ energy
has increased, crop diversity could help to stem the decline in Instead of trying to requirements, not accounting
has remain largely the same pollinator populations. Planting measure food waste directly, for overeating.
since 2000. Producers have native flowers and hedgerows Yu and Jaenicke calculated The study showed that the
opted for large-scale cultivation on agricultural land and a household’s ability to average household wasted
of one crop. That’s a problem restoring neighbouring natural efficiently convert food close to one-third of the food
because monocultures don’t environments could also brought into the household that it bought, which means
provide pollinators with a preserve pollinator habitats. into the energy required to that the United States wastes an
stable, year-round supply of maintain the body weight of its estimated US$240 billion worth
food. This ultimately leads Glob. Change Biol. 25, 3516–3527 residents. First, they obtained of food per year. The most
to a fall in insect numbers, (2019) data on food purchases from efficient household in the study
lower yields and increased around 4,000 households wasted about 9% of its food.
deforestation as demand for that took part in the 2012 US Healthier diets created more
land surges. US household food Department of Agriculture’s waste than unhealthier diets,
Greater reliance on crops National Household Food owing to the greater proportion
that are dependent on single- waste calculated Acquisition and Purchase of fruit and vegetables. Higher-
species pollinators, coupled Working out how much food Survey. The authors then income households wasted
with declining pollinator goes uneaten in an individual calculated the metabolic about 50% more food than
populations, could cause household is notoriously energy requirements of lower-income households,

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
and small households wasted iron, vitamin A, vitamin B12 identified counties that food waste and management of
more per person than large and folate production. Food could recycle nitrogen and urban sprawl.
households. production often fell short in phosphorous nutrients at the The authors assessed each
multiple nutrients. More than local county level, as well as of the actions against the
Am. J. Agric. Econ. 102, 525–547 70% of countries with nutrient four regional manuresheds — United Nations’ 17 Sustainable
(2020) shortfalls produced inadequate in the northwest, southwest, Development Goals (SDGs),
amounts of iron, vitamin A and central and southeast United as well as 18 measures from
folate. And more than one-fifth States — where clusters of the Nature’s Contributions
Hidden hunger a of those not producing enough source counties could join to People (NCP) framework,
nutrients, fell short by more together to develop sustainable which was drawn up by
global problem than half of what was necessary redistribution programmes scientists associated with the
There is more than enough food for their population. over longer distances. The work Intergovernmental Science-
to feed the global population. The authors suggest that suggests a pathway towards Policy Platform on Biodiversity
But local patterns of countries with nutrient removing manure from areas and Ecosystem Services in 2017.
production still leave 10% of the deficiencies could prioritize where it can pollute the local This framework is intended
world’s people with insufficient the production of foods environment and delivering it to recognize nature’s social,
calories, and more than half that contain the nutrients to nutrient-poor agricultural cultural, spiritual and religious
with inadequate quantities and that their population needs. lands, easing the reliance on significance, as well as its role
variety of micronutrients — For example, in places commercial fertilizers that in providing food, clean water
known as hidden hunger. where protein production is pollute the environment and and healthy air.
These are findings of a adequate, shifting production deplete finite natural resources. The analysis revealed
detailed analysis of food to protein sources that are But the authors note that that several interventions
production by Ozge Geyik and higher in vitamin A and further research — on how carried unintended negative
colleagues at Deakin University iron could alleviate these best to recover and transport consequences. The production
in Burwood, Australia. The team nutrient shortfalls. Adding manure, for instance — will be of bioenergy, either with
gathered data on the nutrient micronutrients directly to soils needed to turn the vision into a or without carbon capture,
content of 174 individual foods and the leaves of crop plants is reality. planting forests and
produced across 177 countries another possible solution. commercial crop insurance
between 1995 and 2015. The Agric. Syst. 182, 102813 (2020) all had potentially negative
researchers analysed whether Glob. Food Sec. 24, 100355 consequences for both SDGs
individual countries and (2020) and NCPs. For example,
regions could meet the energy Intervention trade- bioenergy had large negative
needs of their populations, impacts on maintaining land
offs assessed
as well as supply them with Nutrient recycling biodiversity, freshwater quality
protein, iron, zinc, vitamin A, Transforming the way land and food production, despite
vitamin B12 and folate. possibilities mapped is managed and food is providing affordable clean
The study is one of the The age-old practice of produced could shore up energy. About one-third of
first to take such a detailed fertilizing crops with livestock food supplies and address the the interventions proposed
look at global patterns of manure has been reimagined challenges of climate change had no substantial trade-offs.
nutrient production using in a study led by Sheri Spiegal and biodiversity loss. But These included improving
disaggregated food data from the US Department of an assessment of proposed water management, increasing
over time. Previous work has Agriculture in Las Cruces, New interventions reveals that few soil organic carbon content,
typically grouped foods into Mexico. In the study, the team are up to the task of protecting reducing pollution, reducing
broad categories, such as introduces the concept of a both livelihoods and the post-harvest losses and fire
cereals, dairy and vegetable manureshed — land around environment. management.
oils, which can lead to under- livestock farms that could Pamela McElwee from The analysis could
or overestimates of specific benefit from the nutrient- Rutgers University in New help decision-makers to
nutrients. rich manure that those farms Brunswick, New Jersey, and assess environmental or
Global food production produce. her colleagues assessed the developmental policies to
increased steadily over the Spiegal and her colleagues benefits and trade-offs of avoid unintended trade-offs,
two decades, and outpaced mapped a patchwork of more 40 proposed changes to land the authors say.
increases in food requirements. than 3,000 counties across the management, food-production
However, on a regional level, the United States. They classified chains and the management Glob. Change Biol. 26, 4691–4721
analysis found that more than counties as manure sources if of environmental risks. The (2020)
half of the countries in Africa they could supply nutrients in potential interventions are
and Asia were not producing manure from livestock, or sinks outlined in the 2019 report
enough calories for their if the crops grown could use the from the Intergovernmental
For the latest
populations. nutrients from manure. Panel on Climate Change, and research published
In 2015, more than 20% of The work reveals a surfeit include improving management online by Nature visit:
http://go.nature.
the global population lived of opportunity to recycle of livestock, reforestation,
com/2k85xvs
in countries with inadequate nutrients. The researchers reducing consumer and retail

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
IAN WALDIE/BLOOMBERG VIA GETTY

Food labelling can help consumers to make healthier food choices.
Changing diets at scale

Researchers are working out how to achieve a widespread
change in eating behaviour. By Benjamin Plackett
I
f everyone ate a balanced diet featuring acceptance, but scientists don’t know how to of behavioural changes are needed. By compar-
more plant-based and sustainable ani- bring about the reforms needed, on the scale ison, data on what needs to happen in poorer
mal-sourced food, up to eight billion required. and subsistence-farming communities are
tonnes of carbon dioxide emissions might Much of the world’s population, even in almost non-existent. Because the food behav-
be avoided globally each year by 2050, relatively rich nations, cannot afford the kind iours of these communities are thought to be
according to the 2019 special report on climate of sustainable plant-based diet that scientists much more sustainable than those of industri-
change and land by the Intergovernmental favour. As the IPCC special report notes, mit- alized economies, the focus for these societies
Panel on Climate Change (IPCC). igating climate change through dietary mod- is less on pushing urgent changes and more on
Modifying diet on a global scale is a major ification relies on consumers altering their managing social changes to ensure unsustain-
opportunity to combat climate change, argues choices and preferences. These, in turn, are able behaviours aren’t introduced.
the report. guided by “social, cultural, environmental and The IPCC report lists school food pro-
Naoko Ishii, an economist at the University of traditional factors, as well as income growth”, curement, health-insurance initiatives and
Tokyo’s Institute for Future Initiatives, agrees. the report says, all of which are hard to shift. public-awareness campaigns as examples of
“One of the biggest risk factors for the planet’s Studies on which levers for changing food policies that can potentially change demand.
health is our food system,” she says. “The way behaviours work best are surprisingly scant. But research to quantify the effects of vari-
we eat needs to change.” Most research concentrates on richer and ous interventions, such as taxes, labelling or
That opinion might be gaining widespread Western countries, which is where the majority changing in-store food displays, suggests that

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
achieving behavioural change is not straight-
forward. The interactions between the factors
that determine food production and consump-
tion are complex, and the risk of unintended
consequences from interventions can’t be Waste not, want not
ignored.
According to the World Health Organization, Between 2010 and 2016, food waste was countries experiencing rapid development
a sustainable diet means a large proportion of responsible for 8–10% of human-caused are rarely investigated. India, for example,
the global population, particularly in wealthy greenhouse-gas emissions, according to was mentioned in just 12 (6%) studies,
countries, will need to eat fewer processed the 2019 special report on climate change despite it being home to almost one-fifth of
foods and reduce waste (see ‘Waste not, want and land by the Intergovernmental Panel on the world’s population.
not’). Food producers around the world will Climate Change (IPCC). In a 2017 literature A lack of data is hampering the search
also need to cut down on plastic packaging review of food-waste research9, which for a solution, but, as the IPCC report
and use fewer antibiotics and hormones in pooled the findings of 202 studies, the states, there is no panacea. Approaches are
livestock. authors complained that researchers are likely to differ depending on the country.
One 2019 review1 concluded that a sustain- often forced to fall back on old data because In middle- and low-income countries,
able diet would budget for just 14 grams of red there is no up-to-date alternative. improving food-supply-chain logistics to
meat per day, which is roughly one steak per The review found that the majority of ensure consistent access to refrigeration
person, per month. Data from the Organisation research papers focused on Western would go a long way to decreasing waste.
for Economic Co-operation and Development countries. Switzerland, for example, wastes But in high-income countries, more
(OECD) show that achieving such goals will be about one-third of the food it produces; inventive solutions will be needed, the
a more daunting task in some countries than Finnish consumers throw away around report says. For example, replacing 10–19%
in others. For example, Argentinians eat an 30% of the food they buy; and the Danish of animal feed with protein produced by
average of 106.7 g of red meat per day, whereas discard 23%. The United Kingdom, with recycling microbial protein from sewage,
Nigerians consume just 8.3 g. 52 mentions, was the most studied country, for instance, would reduce greenhouse-gas
“It’s not about being anti-meat, it’s just followed by the United States with 51. By emissions associated with pastoral farming
that the ratio of meat to plant-based food is comparison, lower-income countries or by 6–7%.
off-kilter for a lot of us,” says Mark Lawrence, a
public-health economist at Deakin University point of purchase, their true cost is reflected health economist at Imperial College London.
in Burwood, Australia. “It’s also about being in lost productivity and the disease burden “It actually makes me optimistic because back
better with what we have. Up to a third of food associated with obesity. The OECD estimates in 2010 we couldn’t have imagined that govern-
is wasted and that’s terrible given the environ- that in the future the gross domestic product ments would ever tax sugar.”
mental cost of making it.” (GDP) of a country will be 3.3% lower, on aver- And taxes might go beyond just sugar. In a
age, as a result of obesity. Similarly, the full 2017 study5, Lawrence and his colleagues asked
Cash is king cost of unsustainable food is not reflected in 944 people who bought household groceries
The data consistently show that one of the best its price; the hit to a country’s GDP could be to choose between sustainable products and
ways to influence food behaviour is through at least as big as that of obesity, Lawrence more conventional foods. In one of the scenar-
price. If sustainable food were reliably cheaper speculates. “There are a lot of distortions in ios, participants were told that brown rice had
than environmentally damaging products, the market where the true cost, environmental a lower carbon footprint than white rice. They
market forces could often take care of the and economic, is not felt at the cashier,” he says. were then asked to pick between the two. Under
problem. But achieving that is no small task — normal market conditions, in which white rice
sustainable food is often considerably more “It can’t be just about is cheaper, 61% opted for white rice. But when
expensive than its conventional rival products. brown rice was presented as 9% cheaper than
One study2 estimated that the cost of a sus-
making polite nudges white, 57% instead chose the more sustaina-
tainable weekly food basket in Australia is up here and there on price .” ble option. That’s encouraging, says Lawrence,
to 30% more than that of a standard one. That’s because it shows that a small price change can
partly because sustainable practices often nudge enough consumers to give sustainable
carry additional expenses. For example, reduc- So, what needs to be done to help sustaina- products the majority market share.
ing antibiotic use in animal husbandry means ble products compete with their conventional This pattern does not necessarily hold true
that welfare standards need to improve to keep or cheaper counterparts? Tax is the obvious for all products, however. The same experi-
infections low. That doesn’t come cheap, and answer. In the past ten years, tariffs have been ment was conducted for beef steak and sustain-
the cost is passed down the supply chain. levied on sugary drinks in many countries, able alternative, kangaroo steak. Under normal
Once a customer becomes accustomed to a including Barbados, Peru and the United market conditions, people preferred beef. And
price point, it can be tough to convince them Kingdom. A systematic review4 evaluating although some people drifted towards kanga-
to pay more. A survey of 600 city dwellers in their effectiveness collected the findings of roo meat when it was the cheaper option, beef
Poland3 found that higher prices were the main 15 studies, and concluded that, on average, for remained first choice by a significant margin
barrier to them making more sustainable food every 1% bump in price, there’s a corresponding — even with a price difference of 33%. Price,
choices, a pattern that held true even among 1% fall in consumption. therefore, is only one of a number of factors
respondents who were already interested in “Most existing sugar-sweetened beverage influencing whether consumers buy sustaina-
sustainability. taxes are between 10% and 20%, so the effect on ble alternatives. “It can’t be just about making
Although fatty foods might be cheap at the consumption is not trivial,” says Franco Sassi, a polite nudges here and there on price — that

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
outlook
middle-income countries that they can’t have
access to convenience foods,” says Lawrence.
The answer, he explains, is often to fix the
macroeconomics. For example, in some
Pacific island nations, tinned and imported
foods are now cheaper than fresh fish from
local waters. “This is often because interna-
tional trade deals have effectively subsidized
processed food,” he says. “It requires political
will to correct this for an entire economy, but
it doesn’t mean banning these products. It’s
just about making sure the economics of the
system isn’t skewed.”
Unintended consequences
JULIEN VIRY/ISTOCK/GETTY
Although eliminating or significantly
reducing meat consumption would help
the environment, evidence suggests this is
unlikely to happen at scale because many
meat-eaters are reluctant to change their eat-
Kangaroo meat is a sustainable alternative to beef, but consumers can be reluctant to switch. ing habits. A better strategy, some research-
ers argue, is to shift consumer preferences
won’t be enough,” says Lawrence. of people surveyed in each of the European from high carbon-producing meats, such as
Most of the evidence on changing food countries disagreed with statements such as lamb and beef, towards meats with a lower
behaviours comes from work on tackling obe- “a particular food is chosen because it makes environmental impact, such as chicken and
sity. Findings from dietary studies with a focus me look good in front of others”. In Uganda, pork.
on health are being examined for their applica- however, more participants agreed with such In a 2019 study8, marketing experts in
bility to the younger field of food sustainability. statements. Belgium reorganized a butcher’s counter,
One of the methods routinely deployed to “I don’t think we’ll be able to address food increasing the space given to poultry and
encourage healthy diets uses labels designed to behaviour on a global level in a uniform way,” decreasing the space for red meat. This led
inform consumers about the nutritional value says Suzanne Kapelari, an educational scientist to a 13% increase in chicken sales in 4 weeks.
of food. The traffic-light system in the United at the University of Innsbruck, Austria, and an The only trouble is that sales of red meat didn’t
Kingdom, for example, gives shoppers an idea author of the study. “The more we know about fall in tandem, so the net result was a greater
of how healthy a product is or isn’t at a glance. the cultural attitudes to food and behaviours, amount of meat sold, albeit not significantly.
The evidence of the effectiveness of this sort the better, but there’s quite a bit of work to be Although this was one small study, it demon-
of intervention is encouraging. done on that.” strates a broader point: there is no single
The OECD estimates that between 50% and solution to the problem of how to change con-
60% of shoppers check nutritional labels at “I don’t think we’ll be able sumers’ behaviour. “The common feature in
least some of the time. Research established all these areas is their limited effectiveness,”
that labels indicating a product’s health cre-
to address food behaviour says Sassi.
dentials — or lack thereof — are linked to an on a global level in a The hope is that applying a range of meth-
18% increase in people buying healthier food6. uniform way.” ods in a coordinated way will have a cumula-
Labelling on health grounds influences food tive effect. But that hope lacks a solid evidence
behaviour, says one of the authors of the study, base. Researchers are even unsure whether
Michele Cecchini, a health-policy analyst at the Food behaviours in higher-income coun- different groups respond to different meth-
OECD’s health division in Paris. “I don’t see why tries such as many OECD member states are ods. “The truth is that we don’t really know,”
the same wouldn’t also apply to other issues different from those in middle- and low-in- says Sassi. “It’s a gap in our evidence.”
that consumers care about, like sustainability,” come countries. Consumers in wealthy
he says. countries buy more meat, and packaged and Benjamin Plackett is a freelance science
Ishii says that only a proportion of consum- processed foods. “It’s been like this for decades writer based in London.
ers need to change their behaviour for labelling in high-income countries,” says Lawrence.
information to have an impact. “A relatively People in low-income countries, by compari- 1. Willett, W. et al. Lancet 393, 447–492 (2019).
2. Barosh, L., Friel, S., Engelhardt, K. & Chan, L. Aust. N. Z. J.
small number can influence the brand to son, often eat less meat and opt for locally pro- Public Health 38, 7–12 (2014).
change, and therefore they can influence the duced products with less packaging. 3. Rejman, K., Kaczorowska, J., Halicka, E. & Laskowski, W.
wider supply chain,” she says. The emphasis in high-income countries is, Public Health Nutr. 22, 1330–1339 (2019).
4. Teng, A. M. et al. Obes. Rev. 20, 1187–1204 (2019).
therefore, on correcting unsustainable behav- 5. Hoek, A. C., Pearson, D., James, S. W., Lawrence, M. A. &
Cultural matters iours, whereas in low- and middle-income Friel, S. Food Qual. Pref. 58, 94–106 (2017).
A 2020 survey7 of close to 1,200 people across countries, it’s on preventing unsustainable 6. Cecchini, M. & Warin, L. Obes. Rev. 17, 201–210 (2016).
7. Kapelari, S. et al. Sustainability 12, 1509 (2020).
12 European countries and Uganda, high- behaviours becoming the norm.
8. Coucke, N., Vermeir, I., Slabbinck, H. & Van Kerckhove, A.
lighted the influence that culture can have on “We have to be careful here because we Foods 8, 186 (2019).
food behaviours. For example, the majority don’t want to be sitting in ivory towers telling 9. Xue, L. et al. Environ. Sci. Technol. 51, 6618–6633 (2017).

©
2
0
2
0
S
p
r
i
n
g
e
r
N
a
t
u
r
e
L
i
m
i
t
e
d
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
S P O N S O R FE AT U R E
MINT IMAGES/MINT
IMAGES RF/GETTY IMAGES
BUILDING BETTER FOOD SYSTEMS

FOR NUTRITION AND HEALTH
In advance of THE 2021 NUTRITION FOR GROWTH SUMMIT IN TOKYO, a group of leading
international academics, policy analysts and food industry representatives attended a workshop to
discuss global nutrition challenges. To address the obstacles to better health, participants discussed
the stakeholders and levers that can effect change so that nutritious — and sustainable — diets are
available for all.
A
lthough hunger was demographic transition (where poorest of countries, including and effect to food systems —
steadily declining for countries shift from patterns in rural areas and in people with could push up to 100 million
decades, progress of high to lower fertility and low incomes. DBM is seen in people into extreme poverty7.
stalled in 20151 and since then, mortality) and epidemiological around 14 million children in Lockdowns to deal with the
the number of people who transition (where the prevalent Asia and 9.6 million children spread of COVID-19 disrupted
suffer from hunger has slowly disease burden shifts from in Africa. global supply chains and
increased. In 2018, there were infectious to chronic and national, local and household
more than 820 million people degenerative disease). With Syndemics economies. With the increase
hungry2. Around 2 billion people urbanization, economic growth The global synergistic in poverty, and hindrance of
worldwide have micronutrient and technological change, epidemic — the ‘syndemic’ essential interventions and
deficiencies and 149 million diets shift from starchy, low — of overnutrition, food security, COVID-19 could
children are stunted3. And yet variety, low fat and high fibre undernutrition and climate reverse many of the hard-won
in 2018, 40 million children towards the ‘Western’ pattern change was described as gains in maternal and child
under five were overweight, of increased fat, sugar and the greatest challenge for nutrition of recent decades.
and in 2016, 2 billion adults processed foods. The effects human and planetary health Furthermore, overweight
were overweight, with a third of malnutrition have also been in the 21st century6 and and obesity are associated
of these obese. Many countries shown to be intergenerational, has been compounded by with increased likelihood of
are now challenged by what’s where maternal nutrition can set unprecedented challenges to hospitalization, ICU admission
known as the double burden of a trajectory for life-long health of food and nutrition security in and worse outcomes
malnutrition (DBM), in which her offspring5. 2020. In the biggest upsurge in COVID-19.
people are simultaneously In the 1990s, DBM mostly seen in decades, desert locust
overweight and malnourished. affected the highest income outbreaks in East Africa and Building solutions
The double burden countries of the group of low- South Asia have had disastrous It is now evident that human
of malnutrition is closely and middle-income countries effects on local food supplies. and planetary health cannot
associated with nutrition (LMICs), but in the last decade COVID-19 — a pandemic be disentangled, nor can
transition4 which is tracking both it has been seen in even the inextricably linked in cause nutrition be considered in
isolation from the food system. While each session came at
(FROM TOP LEFT TO BOTTOM RIGHT) MONKEYBUSINESSIMAGES/ISTOCK / GETTY IMAGES PLUS;

SCIENCE PHOTO LIBRARY/GETTY IMAGES; SZEFEI/SHUTTERSTCOK; PEETERV/ISTOCK / GETTY IMAGES PLUS
The food system concept, as the challenge from different
put forward by Parsons and angles, a common theme was
Hawkes8 describes the interplay the need for collaborative,
between the food supply chain multi-stakeholder approaches
and social, political, economic to change. The reports continue
and environmental outcomes. on following pages. n
It puts the individual at the core
and yet, power concentrations REFERENCES
within the food system exist and 1. FAO, IFAD, UNICEF, WFP and
WHO. The State of Food Security
markedly skew the attainment and Nutrition in the World 2019.
of health and sustainable Safeguarding against economic
and affordable diets for all. slowdowns and downturns. (2019).
2. FAO, IFAD, UNICEF, WFP and
Malnutrition is a consequence.
WHO. The State of Food Security and
Meanwhile, nutrition Nutrition in the World 2018. Building
remains everyone’s business. climate resilience for food security
Food systems are a major and nutrition. (2018)
3. UNICEF. The State of the World’s
source of livelihoods globally. Children 2019: Children, food and
It is estimated that one in three nutrition. (2019)
workers are employed in the 4. Popkin, B. & Gordon-Larsen, P.
The nutrition transition: worldwide
agricultural sector alone9. A
obesity dynamics and their
Chatham House report10,11 determinants. Int J Obes 28,
showed that businesses in S2–S9 (2004).
19 countries lose up to an Economic, social, political and other factors determine what people eat. 5. Black, R. et al. Maternal and child
undernutrition and overweight in
estimated US$38 billion a year low-income and middle-income
from lost worker productivity nutrition must also be geared of urbanization, and dictated countries. Lancet 382, 427 - 451
because of undernutrition, and towards sustainability. The by increasing inequalities in (2013)
6. Swinburn, A. et al.
up to an estimated US$27 billion food system contributes about society. The disruption to food
The Global Syndemic of Obesity,
a year because of obesity. The one-third of anthropogenic supply as a result of climate Undernutrition, and Climate
authors conclude there is an greenhouse gas emissions, change will only serve to Change: The Lancet Commission
incentive for the food industry diminishes biodiversity, pollutes throw up further challenges to report. The Lancet Commissions 393,
791–846 (2019).
to invest in better nutrition, water and degrades soil. providing nutrition for all. 7. We all stand together. Nat Food 1,
because it could drive a stronger And the damage caused At the Food Systems for 583 (2020).
and more productive economy by the climate emergency will Nutrition and Health workshop 8. Parsons, K. and Hawkes, C. WHO
Policy Brief 31: Connecting food
generally, as well as helping exacerbate changing diets. held in advance of the 2021
systems for co-benefits: How can food
its own productivity. Around Around two-fifths of the Nutrition for Growth Summit in systems combine diet-related health
80% of food is produced by the world’s population is unable to Tokyo, three breakout sessions with environmental and economic
private sector (mostly small- afford a healthy diet. This has considered the role of policy, policy goals? (2018)
9. FAO. FAO Statistical Yearbook 2012.
and medium-sized enterprises) a huge impact on the health of the role of the private sector World Food and Agriculture. (2012)
meaning communication of the the population, as poor diet, and the role of the nutrition 10. Wellesley, L. et al. Chatham House
benefits of transformation will whether undernutrition or over- research community in bringing report: The Business Case for
Investment in Nutrition. (2020).
need to be multi-faceted, multi- consumption, plays a significant about necessary change to
11. Pringle, S. Tackling Malnutrition:
sectoral, long-term campaigns. role in health and health risks. global food systems in order Harnessing the Power of
Within that transformation, Dietary changes are driven by to provide healthy, nutritious Business. Chatham House Expert
food systems geared towards a country’s wealth and level and sustainable diets for all. Comment (2020).
Ajinomoto
Group Global
Event Organizers
Brand
Guidelines
Springer Nature, Nature Food, RESULTS Japan, GGG+ LEARN MORE

No. 2-03
ABOUT
Supported
メッセージ付AGBby 基本表示規定 Event Sponsors
メッセージ付AGB カラー PANELLISTS
Ministry of Foreign Affairs of Japan, プリント表示の場合
特色：味の素レッド
表示色
Ministry of Health,
メッセージ付AGBのカラーは、
「味の素レッド」を基本とします。特色
近似色：PANTONE 199C
プロセス：M100%+Y100%
Labour and Welfare

を使用できない場合は、
of Japan,
プロセスカラーを使用することも可能です。デジタル表示の場合
RGB: 238, 28, 38
Ministry
クリアスペース
of Agriculture,
メッセージ付AGBの独自性を保ち、常に明瞭に表示するために、その
HEX: #EE1C26
Forestry and Fisheries of Japan

周囲に最小限のクリアスペースを設定しています。
クリアスペースは、AGマークの
「白ふち」
の幅（X）を基準にし、周囲
に設けています。
クリアスペース内には、文字やロゴ、グラフィック要素を表示する
ことはできません。クリアスペース最小使用サイズ
クリアスペースを白く塗りつぶしたものを「白窓」
と呼びます。
X X
（「白ふち」
「白窓」については No. 2-06 参照）
X
最小使用サイズ
メッセージ付AGBが明瞭に表示できる最小サイズとして、
最小使用 X
BREAKOUT SESSION: NUTRITION FOR HEALTH

AHIRAO_PHOTO/ISTOCK / GETTY IMAGES PLUS
EFFECTING CHANGE IN
A FRACTURED LANDSCAPE
The double burden of malnutri- change and provide food
tion (DBM) — the combination environments that incentivize
problem of both too many healthy eating. Government
calories and simultaneous mal- approaches could include
PERSPECTIVES FROM JAPAN : TOWARDS nutrition — is increasing in low- taxing highly-processed foods
NUTRITION FOR GROWTH, TOKYO 2021 and middle-income countries. and sugary drinks, improving
Inadequate nutrition is still a leading cause of global deaths. This, along with the climate transport infrastructure for
“Figures show that around 38%, or 5.3 million children, emergency and COVID-19 affordable delivery of food,
are dying before their fifth birthday. According to a 2013 pandemic, is having a signifi- providing safety nets for
Lancet paper, 45% of those deaths are due to malnutrition. cant impact on global health consumers who cannot afford a
In other words, it is estimated that if those infants had been and increasing the risk of healthy diet, investing in public
able to get adequate nutrition, they would not have died non-communicable diseases. awareness campaigns, and
from infectious diseases such as diarrhoea, measles, acute The Nutrition for Health making nutrition a focus for
respiratory infections, malaria, tuberculosis and AIDS,” said
breakout session of the Food health both in schools and for
SANTO Akiko, President of the House of Councillors, the
National Diet of Japan. Systems for Nutrition and the general public. Companies
Nutrition has an important role to play in prevention and Health workshop, comprised need to invest in innovation
control of disease, including COVID-19. of experts from industry, to improve the nutritional
“Japan achieved a dramatic reduction in the morbidity government and academia, value and safety of their food,
and mortality of communicable diseases…through nutrition discussed existing policies but not at the expense of
improvement after the Second World War. Recent studies… to tackle DBM. Examples palatability. Investment in R&D
have shown a close relationship between COVID-19 and include Ghana’s ‘Planting is expensive and needs support
malnutrition. Therefore, taking measures to improve for food and jobs’, which to develop a sustainable
nutrition is critical,” said SHOBAYASHI Tokuaki, Director has helped communities to business case that does not
General, Health Service Bureau, Ministry of Health, Labour
expand their land use, and compromise affordability.
and Welfare.
This requires a shift towards better food and healthier diets. community dams that support Change is even more
“COVID-19 [has] brought a significant demand shift fish farms and improve land important in light of the
from eating out to home,” said OSAWA Makoto, Vice- irrigation. Japan worked COVID-19 pandemic, which
Minister for International Affairs, Ministry of Agriculture, to improve undernutrition demonstrated the impact of
Forestry and Fisheries, Japan. “We take this opportunity to following the Second World overweight and nutrition on
propose healthier and more sustainable dietary habits by War, and overweight and the progress of infectious
promoting traditional local diets based on local production obesity following economic disease. The strategies to keep
for local consumption.” growth, through education and the spread of COVID-19 under
Achieving an improvement in diet and sustainable eating
improving access to healthy and control have left some people
requires a policy shift.
nutritious food, including for isolated, and many have lost
“Investment in health system strengthening is a
prerequisite for sustainable development and economic low-income populations as part their jobs and experienced
growth. Good nutrition is a foundation for healthy lives and of universal health coverage. serious economic hardship. The
sustainable health systems,” said MIMURA Atsushi, Deputy Tackling DBM needs experience of countries such
Vice Minister of Finance for International Affairs, Ministry behaviour change of all as Japan, which has survived
of Finance. food system actors and natural disasters, has shown
The World Bank’s Human Capital Project is an important consumers, backed by a deep the importance of nutritional
part of the process. understanding of what drives preparedness. While staples
“The Human Capital Project considers nutrition as people’s decisions in nutrition that can be stored long-term
an essential element in unlocking human potential and
and health. All stakeholders, are a defence against hunger,
economic growth,” added Mimura.
including governments, food support should be safe
The current global situation and the forthcoming summit
provides an opportunity to transform the way the world companies, the World Health and nutritionally balanced,
tackles the global malnutrition challenge. Organization, the UN Food with the right levels of protein,
“We expect countries to review their existing nutrition and Agriculture Organization, lipids, vitamins, minerals and
policy, consult with other nutrition stakeholders and the International Union of dietary fibre. The store of
announce ‘SMART’ commitments at the Tokyo N4G Summit Nutritional Sciences, non- staples can be buttressed by
2021,” said ONO Keiichi, Ambassador, Director-General for government organizations and local production of nutrient-
Global Issues, Ministry of Foreign Affairs of Japan. policy makers, will need to be dense food that is accessible
involved to facilitate behavioural and affordable.
The panel heard that (including medical schools) is local academic institutions, and research. Countries like
the One Health Approach, required for precision plane- food companies and schools, Sri Lanka and Japan have
which looks at the interaction tary, population and personal and by making the most of nutritionists embedded within
between human and animal nutrition and health. locally available and low-cost their ministries of health. Japan
health and the environment, Data and evidence was also point-of-care technologies also has dedicated nutritionists
will be critical, particularly with discussed as a global challenge and innovations. working within the Ministry
the threat of future zoonotic related to nutrition. Researchers To really make an impact, the of Agriculture, Forestry and
diseases. Approaches aimed need data collected at a local nutrition community needs to be Fisheries, the Ministry of
at improving human health level to track DBM, but there involved in the political decision- Education, and the consumer
also need to consider animal isn’t enough available. Better making process. As well as being affairs agency. In order to beat
health and the environment, and more granular data, able to provide support and DBM and other nutritional
and this requires collaboration collected more frequently, will evidence, nutrition professionals challenges, nutritionists need
and cooperation between re- help to track diet, weight, blood should assess the trade-offs and to reach out to, and work
searchers involved in all these sugar and/or health. This can challenges that policymakers with, other sectors to find
fields. Education at all levels be improved by working with face and feed this into education common solutions.n
BREAKOUT SESSION: BUSINESS FOR NUTRITION population as a whole. It could a role in influencing dietary
ARE HEALTHY PROFITS also contribute to increased

profits for companies across
preferences through educational
campaigns, food labelling, public
HOLDING BACK all sectors. The panel heard regulations, taxing unhealthy
HEALTHY DIETS? about well-known global brands
that are inspiring consumers to
foods and other measures.
The panel heard that some
use more nutritious products regulatory flexibility may be
There are many barriers to the A report from Chatham to create nutrient-rich and needed to allow for product
wide adoption of sustainably House entitled The Business affordable meals. reformulation. Companies
produced food for healthy Case for Investment in Nutrition, The panel discussed how indicated that such flexible
diets. Everything from suggests that making farm subsidies could be directed frameworks help attract more
nutrition education, through businesses aware of the adverse to incentivize the production investment in healthier options.
to accessibility, convenience, impacts of poor nutrition on of healthy foods such as By increasing the demand for
trends and marketing have their profits and market growth fresh fruits and vegetables in healthier food, this could have
a hand in consumer choice. could be a persuasive approach. a more sustainable manner. a knock-on effect of cutting
With a large percentage of the Catering for more nutritious Governments can also introduce production costs through
world’s population reliant on diets, not only through product minimum standards in public innovation and economies
cheap calorie-rich staples and lines but also food in the procurement policies. of scale.
unable to afford more nutrient- workplace, could benefit both Aside from awareness- Markets for ready-to-eat
rich food, cost is perhaps the the health of food company raising about healthy diets, meals, snacks and other comfort
largest obstacle. employees and that of the governments can further have foods are growing and proven to
A panel of leading food be highly profitable. During the
industry, academic and
TETRA IMAGES/GETTY IMAGES

COVID-19 pandemic, demand
government representatives for comfort food has increased.
formed a break-out from the However, many of these (ultra-)
Food Systems for Nutrition and processed foods can be calorie-
Health workshop to discuss the dense, high in added sugars and
relationship between profit and salt, and low in nutrients. The
public good. panel put forward examples
Food manufacturers often of governments that have
produce foods with high levels taken different approaches
of calories, salt, sugar and other to improving consumer food
unhealthy attributes. For lack selection. Chile, Mexico, the
of a better term, such foods UK and South Africa have
are sometimes labelled as introduced taxes on sugary
‘ultra-processed’, though the drinks, accompanied by
concern is not with the degree of marketing regulations, which
processing, but rather with the have driven companies to
nutritional content considered change business strategies.
as unhealthy. Better product labelling is one approach to changing consumer behaviour. Indonesia has focused on
improving nutritional literacy which will make setting aside not-for-profit funding groups and transparent and evidence-
to promote behaviour change. resources for improving and governments. based discussions with
Japan has nutrition training in nutrition more difficult for both Consumer behaviour and clear accountability.
schools. Israel is taking a more domestic governments and choice is shaped by advertising The first United Nations
positive approach by labelling overseas donor governments. and marketing messages. The Food Systems Summit in 2021
the healthy foods. Innovative funding sources panel agreed that changing along with the Nutrition for
The panel acknowledged that and private sector finance will behaviours and shaping Growth Summit due to be
the COVID-19 pandemic has need to step up. Investment in markets, for both industry held in Tokyo in 2021 provide
impacted national economies, research and development will and consumers, will need a good opportunities to advance
and constrained budgets, need to come from industry, multi-stakeholder approach, this agenda.n
BREAKOUT SESSION: POLICY in health, nutrition and society.
POLICY LANDSCAPE SUPPORTING There should also be recognition

and incentives to encourage
BUSINESS AND NUTRITION stakeholder involvement. This
‘planetary health’ message
needs to be built into nutrition
Ending hunger and malnutrition research, and into training for
SUBMAN/E+/GETTY IMAGES
requires political will. In a nutrition professionals.

breakout session held two weeks Redirecting current
after the main Food Systems for subsidies and introducing new
Nutrition and Health workshop, subsidies could play a role in
a panel of expert representatives supporting both public and
from business, academia and planetary health, because
government discussed how at the moment subsidies do
food policy needs to shift to not necessarily support the
drive nutrition for health and production of healthy food in
well-being and the sustainable a way that is either efficient
production and consumption of or sustainable.
nutrient-dense foods. However, One of the biggest challenges
as the Business for Nutrition in nutrition policy is effectively
breakout session also discussed, engaging and communicating
many of the food policies in Access to low-cost, nutrient-dense, sustainably produced food is the goal. with the general population.
place around the world focus The most effective messages
on producing staples, and on actions, including health, and giving companies two are tangible, harmonized across
maintaining the agriculture agriculture and education, years’ notice to reformulate stakeholders and come from a
industry and tend to support the the widespread mobilization their products. The success of credible source. Countries can
consumption of energy-dense, of village volunteers, and the these programmes has inspired learn from one another, but the
nutrient-poor foods. support from officials and further campaigns and activities; messaging needs to be tailored
The panel heard how experts resulted in a remarkable most recently the government to the social, religious and
Thailand has been successful decline in child undernutrition considered the merits of a salt tax. cultural climate of the location.
in using policy to improve within a decade. As Thailand has Beyond public health, the The COVID-19 pandemic has
the nation’s nutrition. In the become richer, the population panel heard that nutrition shown the impact that global
1970s, evidence from research began to experience DBM, with policy must also focus on emergencies can have on food
and surveys convinced the increasing levels of obesity and planetary health, and this will systems. The panel discussed
government that nutrition was simultaneous malnutrition. require policies that create links the importance of learning from
essential for development, The government introduced between food, health, education, the crisis, and ‘building back
and that undernutrition in the taxes in 2001 to tackle levels of biodiversity, sustainability, the better’. Food and nutrition needs
country was linked with poverty. tobacco and alcohol use, with environment, and the climate to be part of any emergency
The resulting poverty alleviation a percentage of the proceeds emergency. Successful policies and recovery plans, and part of
plan led to a multisectoral going to health promotion. In will rely on all the players plans for the health of the nation
approach and unlocked funding 2017, building on a programme working together, including and the planet. This will require
that supported community-level targeting children called ‘Sweet communities and small to stakeholders from academia,
planning and implementation, Enough’, the government medium enterprises. Taxes industry, nutrition science and
which was important to ensure deployed an evidence-based can play an important role, the community to be invited
ownership. The integrated sugar tax, using a tiered scale especially if they are reinvested to the table. n

Nature - 2020 12 10

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nature - 2020 12 10

Uploaded by

Copyright:

Available Formats

The international journal of science / 10 December 2020

Eyes of the world

regulators applications, if they want Europe-wide approval.

Nature | Vol 588 | 10 December 2020 | 195

Accounting for sex

196 | Nature | Vol 588 | 10 December 2020

Institutions can retool for

Nature | Vol 588 | 10 December 2020 | 197

Nature | Vol 588 | 10 December 2020 | 201

A protein’s function is determined by its 3D shape.

‘IT WILL CHANGE EVERYTHING’:

Nature | Vol 588 | 10 December 2020 | 203

80 is considered roughly University of California, Berkeley, whose team

10 to flourish. “This is going to empower a new

204 | Nature | Vol 588 | 10 December 2020

became infected without developing symp-

How long will vaccine-induced

NOW WANT TO KNOW

Nature | Vol 588 | 10 December 2020 | 205

CAN JOE BIDEN MAKE

GOOD ON HIS AMBITIOUS

206 | Nature | Vol 588 | 10 December 2020

standards for vehicles, the administration

Nature | Vol 588 | 10 December 2020 | 207

208 | Nature | Vol 588 | 10 December 2020

Nature | Vol 588 | 10 December 2020 | 209

PLANET OBSERVER/UNIVERSAL IMAGES GROUP/GETTY

THE WATER PARADOX

210 | Nature | Vol 588 | 10 December 2020

Nature | Vol 588 | 10 December 2020 | 211

A possible solution was proposed in May

212 | Nature | Vol 588 | 10 December 2020

energy of the impact, with multiple streams

Nature | Vol 588 | 10 December 2020 | 213

Books & arts

Kasparov in 1997 — simulate an entire rodent

Rise and fall

But that year, Hutton also started to encoun-

brain model: the movie

Nature | Vol 588 | 10 December 2020 | 215

216 | Nature | Vol 588 | 10 December 2020

Rescue Brazil’s burning Pantanal

Nature | Vol 588 | 10 December 2020 | 217

218 | Nature | Vol 588 | 10 December 2020

Nature | Vol 588 | 10 December 2020 | 219

220 | Nature | Vol 588 | 10 December 2020

News & views

The changing face of birds

Nature | Vol 588 | 10 December 2020 | 221

How protons interact

222 | Nature | Vol 588 | 10 December 2020

Cracking the cell access

Nature | Vol 588 | 10 December 2020 | 223

Cracking the cell access

Nature | Vol 588 | 10 December 2020 | 223

224 | Nature | Vol 588 | 10 December 2020

Trade-offs for equitable

climate policy assessed carbon prices and calculating the international

for poor countries to help them tackle cli-

Nature | Vol 588 | 10 December 2020 | 225

Publishing high-quality Submit your research and benefit from a fast

nature.com/ncomms @NatureComms @NatureCommunications A81661

Detection of large-scale X-ray bubbles in the