Professional Documents
Culture Documents
are on medicines own medical regulators, but the EMA provides manufac-
turers with a single place for scientific evaluation of drug
T
ties (ICMRA) to allow regulators to share information and
he roll out of COVID-19 vaccines is under way, agree on approaches. The ICMRA has 29 members, includ-
but without, it seems, much global coordina- ing regulators from China, Europe and the United States.
tion where timing is concerned. China, Russia Through it, members have been able to reach a consensus
and the United Arab Emirates began adminis- on the best animal models for testing COVID-19 vaccines,
tering vaccines before the conclusion of clinical the ideal clinical-trial end points and the complicated issue
trials. Last week, the United Kingdom issued emergency of continuing placebo-controlled trials after vaccine roll
approval for a vaccine developed by the US biopharmaceu- out begins. The coalition’s COVID-19 working group is
tical company Pfizer and BioNTech of Germany following Regulators now trying to harmonize the monitoring of vaccines once
positive results from phase III testing (see page 205). The US want to be they have been deployed, because faint signals of adverse
Food and Drug Administration (FDA) has needed longer to effects might be too weak to spot in any one country.
make its decision on the same vaccine. And the regulatory
able to talk And then there is the WHO itself. Low- and middle-in-
agencies of Australia, the European Union and Switzerland to each other come countries can now benefit from the work that goes
are taking longer still. in the same into its Emergency Use Listing (EUL) process. On 13 Novem-
This patchwork of different approvals processes, despite ber, the agency issued its first ever such vaccine listing for
COVID-19 being the one common enemy, has revived a
units and a new polio vaccine. Around the end of October, the WHO
long-standing question of how to enhance harmonization about the requested that both the FDA and the EMA assess the suit-
in vaccine regulation. Researchers reviewing the regulatory same end ability of COVID-19 vaccines for low- and middle-income
landscape found at least 50 pathways to various types of points.” countries as they consider whether to issue emergency
accelerated vaccine approval in a group of 24 countries authorizations. It is not clear whether the regulators will
(S. Simpson et al. npj Vaccines 5, 101; 2020). agree — but if either does, the WHO can draw on that anal-
Greater harmonization would bring many benefits. Drug ysis and issue its own EUL within days of the decision. That
companies could look forward to agreed definitions for would be collaboration indeed.
different types of approval, and would also benefit from These are all important and necessary efforts. The need
agreed guidelines for criteria that their vaccine candidates now is to go a step further and find a path through the many
would need to meet. If countries’ regulators were to ask different types of vaccine approval. Before the pandemic,
for broadly the same things, companies could cut the time the Coalition for Epidemic Preparedness, a global group of
needed to prepare their drug applications. Companies, for funding agencies, companies and non-governmental organ-
their part, would need to allow — or help to create — a secure izations, set up a working group to map out obstacles to bet-
way for regulators to share data, which they are often not ter regulatory alignment, in anticipation of a new infectious
permitted to do at present. disease. This process confirmed how regulatory agencies
By assessing the same data, regulators could more easily differ on issues such as the use of genetic modification in
compare their findings and analyses with those of others, vaccine development, trials in pregnant women, and even
and their decisions would not only be more robust, but vial labelling. But it also meant that inconsistencies were
also be seen to be so. That, in turn, would shore up public already mapped out and under discussion when the pan-
confidence in a world in which vaccine hesitancy is rising demic struck. COVID-19 has intensified these discussions.
and in which many citizens already have the means to com- The next step will not be easy. Regulators want to be able
pare regulatory verdicts. This would be an evolutionary to exchange data. Their experiences during the pandemic
shift, not a revolutionary one, because in recent years — and have convinced many that they are moving towards a point
particularly after the Ebola crisis — regulators have made at which this will be possible. They want to be able to talk to
unprecedented efforts to discuss, coordinate and begin each other in the same units and about the same end points;
to harmonize some of their processes. and to make decisions based on the same data.
The FDA, which was set up in 1906, is the world’s old- Ultimately, each country must make its own decisions
est national medicines regulator. But the world has been about what’s best. But the goal of a harmonized regulatory
moving towards greater regulatory coordination for some dossier for vaccines, conforming to an agreed set of interna-
time. Europe’s regulatory system comprises a network of tional regulatory requirements, would be transformative.
and gender makes trials, it remains a work in progress in many fields, Londa
Schiebinger, a science historian at Stanford University in
science better California, told Nature (see page 209). Researchers have
been highlighting the harms caused by failing to account
for sex and gender for decades, but it wasn’t until after the
turn of the millennium that funding bodies really started
The European Commission is set to insist on to address the problem. The Canadian Institutes of Health
steps that will make research design more Research began to request that analyses of sex and gen-
inclusive. der be included in grant applications in 2010, and the US
National Institutes of Health followed suit in 2016.
A
The European Commission began asking grant recipi-
t the end of last month, the European Com- ents to include sex and gender analysis in their research
mission announced that its grant recipients design in 2013, a request which, by 2020, covered around
will be required to incorporate sex and gender one-third of research fields. But according to later eval-
analyses into the design of research studies. uation reports, fewer researchers than expected imple-
The policy will affect researchers applying mented this request.
for grants that are part of the commission’s seven-year, An analysis of researchers funded by the Canadian Insti-
�85-billion (US$100-billion) Horizon Europe programme, tutes of Health Research, published in 2014, revealed that
which is due to begin next year. Researchers some had pushed back when asked to consider sex and
The funding is still awaiting sign-off from the European have been gender3. And both this analysis and the European Com-
Union’s 27 member states. But if all goes to plan, the com- mission’s evaluation highlighted that some grant recipi-
mission will be the largest funder to require sex and gender
highlighting ents used sex (which refers to biological characteristics)
analyses — along with analyses of other aspects of inclu- the harms interchangeably with gender (which is a social construct
sion, also known as intersectionality — in research design. caused by and is not necessarily aligned with a person’s sex). To help
Such analyses could include disaggregating data by sex researchers to better appreciate the value of sex and gen-
when examining cells, or considering how a technology
failing to der analysis, the commission’s expert advisory group —
might perpetuate gender stereotypes. acount for which Schiebinger chairs — has published 15 case studies
It’s a significant achievement. Science will be strength- sex and as examples of good practice (go.nature.com/33vxcxz).
ened by researchers incorporating analyses of sex and gender for Another positive action could be for research teams to
gender into their work at every stage — from study design include appropriate specialists to advise on, participate in,
to gathering data, analysing those data and drawing decades.” or lead the design of more-inclusive research. Groups could
conclusions. include researchers from the social or health sciences — the
The European Commission is not the first funding Canadian Institutes of Health Research analysis revealed
agency to make such changes. And this isn’t the first time that health- and social-science researchers are more likely
it has requested that studies account for sex and gender. to include sex and gender analyses in project design than
But in Horizon Europe, the requirement becomes a man- are researchers in the biomedical sciences.
date, and is expected to extend, by default, to most grant Ultimately, inclusive research design cannot be the
recipients. Exceptions will be made only for those working sole responsibility of funders. Some journals — includ-
on topics for which the commission thinks such studies ing Nature — are requesting that authors include sex and
would not be relevant, such as in pure mathematics. gender analyses, when appropriate. Universities and
Science and scientists have a troubled history of failing research supervisors also need to incorporate inclusive
to account for sex and gender when designing research. For design into the research methodology training they pro-
decades, crash test dummies were based on male bodies. vide to students.
Even though smaller models are now used to represent The European Commission is rightly adding its consid-
women, they fail to account for some other typical differ- erable voice to the effort to ensure that science is designed
ences, such as neck strength1. The inclusion of sex and gen- and carried out in a more inclusive way. But to change prac-
der analyses can also be revelatory. Sea turtles in Australia’s tices that have existed for centuries, more researchers —
Great Barrier Reef are being born mostly female because especially research leaders — need to accept where they
of warming temperatures — a discovery that was made have been going wrong, and how research and individuals
when researchers were able to analyse male and female have suffered as a result. The foundations are being laid for
populations2. better science, and the more hands join in this important
In some cases, the results of not accounting for sex and effort, the better.
gender have been catastrophic. Between 1997 and 2001, ten
prescription drugs were withdrawn from use in the United 1. Linder, A. & Svedberg, W. Accid. Anal. Prev. 127, 156–162 (2019).
2. Jensen, M. P. et al. Curr. Biol. 28, 154–159 (2018).
States, eight of which had been found to be more danger- 3. Johnson, J. , Sharman, Z., Vissandjée, B. & Stewart, D. E. PLoS ONE 9,
ous for women than men. This had been missed, in part, e99900 (2014).
World view
By Ulrich Dirnagl
F
We tried to craft a system designed for its own improve-
ive years ago, I was part of a small group of ‘activ- sure that we ment. For example, we have developed an anonymous
ists’ who convinced the Berlin Institute of Health were viewed online tool through which researchers have reported
(BIH), where I work, to try out a set of reforms
as a resource, hundreds of errors and worrying incidents (U. Dirnagl
intended to improve the trustworthiness, use- et al. PLoS Biol. 14, e2000705; 2016). This has allowed us
fulness and ethics of research. Things grew from not a policing to learn from errors — for example, a technician realized
there: three years ago, with the help of government grants unit.” that ambiguous labelling of cell-culture media by a man-
and some nudging by a retired local politician, we secured ufacturer had spoiled her experiment. Her swift report-
€2.5 million (US$2.9 million) per year for efforts to build up ing prevented others from making the same mistake. The
incentives and technologies that increase rigour. company changed the labels on its flasks and alerted other
We were inspired by initiatives at other universities, such customers. After we saw many errors stemming from the
as the reforms that Frank Miedema introduced during his use of pipettes outside the calibrated range, we set up
deanship at the University Medical Center Utrecht in the ‘pipetting exercises’ and saw the rate of these errors fall.
Netherlands. But when the QUEST Center (QUEST stands Three years in, we’re seeing more papers published
for Quality, Ethics, Open Science and Translation) launched open access and with open data. We’re also seeing greater
at the BIH, there was no precedent or blueprint for a pro- participation in educational activities and in intramural
gramme of this scale. programmes using responsible selection criteria, such as
From the beginning, we presumed that researchers and engagement with patient communities, reuse of data or
clinician–scientists are skilled professionals who want to preregistration. Of course, funders and journals are also
‘do the right thing’ but are also under pressure to accrue pulling in the same direction, so it is impossible to know to
publications to advance their careers. Doing quality which changes are due to the efforts of QUEST.
research takes time and humility, so unless we changed However, we still have a long way to go. Our benchmark-
the system, researchers who pursued quality-enhancing ing study found that, within 2 years of completion, only 40%
practices could have found themselves at a disadvantage. of studies sponsored by the Charité had reported results
What was the solution? We made sure that we were viewed (S. Wieschowski et al. J. Clin. Epidemiol. 115, 37–45; 2019).
as a resource, not a policing unit. We selected interventions Furthermore, 5 years after completion, more than 30% of
that we thought we could implement. Alongside introduc- results remained unavailable. But we hope to correct this.
ing courses on experimental design and methods aimed at We use counselling and web tools to offer guidance on how
reducing bias, we focused on practices to increase the trans- to publish null, inconclusive, negative and other ‘nonstand-
parency of research. One push was for the use of electronic ard’ results, and award monetary research bonuses for
laboratory notebooks (ELNs), which improve research doc- the publication of negative results or replication studies.
umentation and make collaboration easier. We made sure Most faculty members welcome our activities, and we are
that QUEST, and not individual labs, covered the licence working to expand student and researcher engagement.
fees and provided plenty of support. So far, nearly 2,000 For example, using funding from the biomedical
of our 7,000 researchers, PhD students and technicians are research charity Wellcome in London, we have established
registered ELN users; my guess is that about half of these fellowships for mid-career researchers who collaborate
have an ELN as their primary lab notebook. For many, ELNs to develop and track initiatives for improving science in
are a necessary first step towards systematically managing their own research groups. Our experience shows that
their research data, which QUEST also supports. structured programmes can be rolled out by any academic
We simultaneously adjusted the incentive and reward institution that is willing and able to improve its research in
system. When hiring professors and awarding institutional a systematic fashion. The budget of QUEST is less than 1%
funds, we now consider how thoroughly and quickly people of our institution’s state funding for research and teaching,
share their results. Those who make original data availa- not including monies from third-party funders.
ble in publications are rewarded with a financial bonus Ulrich Dirnagl directs QUEST started from scratch. But many institutions
that can be spent on research. QUEST works with the BIH the QUEST Center at already promote activities such as open science, data
THOMAS RAFALZYK
and the leadership of the Charité, Berlin’s university med- the Berlin Institute management and responsible research. If they align their
ical centre, to ensure that evaluation criteria encompass of Health. efforts, they can expand them and incorporate scientific
responsible research practices, including publication e-mail: ulrich. ideals into incentive structures. The quality of science and
of null results, provision of open data and community dirnagl@charite.de the culture of the workplace will be better off.
News in brief
COVID-19 VACCINES
ARE NOT BEING
SHARED EQUALLY
Vaccine developers who have
already reported promising
phase III trial results against
COVID-19 estimate that,
between them, they can make
sufficient doses for more
than one-third of the world’s
MILKY WAY MAP
population by the end of 2021.
But many people in low- REVEALS ONE BILLION
income countries might have STARS IN MOTION
to wait until 2023 or 2024 for
vaccination. Five rich countries A huge data update from the
and the European Union have Gaia space observatory — which
pre-ordered about half of is tracking more than one billion
expected capacity for 2021, stars in the Galaxy — offers a
according to data from Airfinity, picture of what Earth’s night sky
a life-sciences market analytics will look like for 1.6 million years
firm in London. Canada leads to come.
on vaccine deals per capita,
with nearly nine doses per
The European Space Agency
probe lifted off in late 2013, and Arecibo telescope collapses
person. Most low- and middle-
income countries will rely on
began observing stars in July
2014 from a perch 1.5 million
in gut-wrenching display
contributions from COVAX, a kilometres from Earth. Gaia The iconic radio telescope at the Arecibo Observatory
joint fund for equitable vaccine continuously scans the sky as in Puerto Rico has collapsed, leaving astronomers and
distribution. it slowly spins, and it has now
the Puerto Rican scientific community to mourn its
measured the positions of the
BEST AND WORST SUPPLIED same stars multiple times. This demise.
Canada has pre-ordered almost 9 doses enables scientists to track stars’ Engineers had warned that the 900-tonne platform
of COVID-19 vaccines per person.
nearly imperceptible motions suspended above the telescope’s 305-metre-wide dish
Pre-ordered across the Galaxy year after
Potential for expansion in deal could fall at any moment, given that one of the main
year, and to triangulate their
positions using a technique
cables supporting it had snapped in early November.
Canada
called parallax. Last month, the US National Science Foundation,
United States
The latest update is based which owns the observatory, announced that it would
PHOTOGRAPHS L TO R: ESA/GAIA/DPAC (CC BY-SA 3.0 IGO); RICARDO ARDUENGO/AFP VIA GETTY
United Kingdom on around three years of data, shut down the telescope permanently, citing safety
Australia and includes a complete census
concerns over its instability, and damage too extensive
European Union of the Sun’s neighbourhood:
all but the faintest stars to repair.
SOURCE: DATA FROM AIRFINITY, UP TO 19 NOVEMBER/NATURE TABULATIONS;
Japan
within 100 parsecs (326 light The platform plummeted into the dish after
Vietnam years), totalling more than some cables failed just before 8 a.m. local time on
India 300,000 objects. The mission 1 December. No one was injured.
Israel has expanded its catalogue
Once the world’s largest single-dish radio telescope,
of stars by 15%, and its
Switzerland
measurements have become the Arecibo facility has been the site of many key
Indonesia more precise astronomical discoveries over the years, including
Brazil The data will underpin studies observations of the spinning stars known as pulsars
Latin America that range from the origins
(excl. Brazil) that led to the 1993 Nobel Prize in Physics.
and evolution of the Galaxy to
Egypt “Our hearts are heavy about this,” said Thomas
locating its dark matter.
Mexico Zurbuchen, NASA’s associate administrator for
China science, at a 1 December NASA advisory meeting.
COVAX It is unclear whether the dish will be demolished,
0 2 4 6 8 10 rebuilt or left in ruins.
Doses per person
News in focus
EDWARD KINSMAN/SPL
A
results were announced on 30 November, at to understand the building blocks of cells and
the start of the conference — held virtually this aid more advanced drug discovery.
n artificial intelligence (AI) network year — that takes stock of the exercise. AlphaFold came top of the table at the
developed by Google AI offshoot “This is a big deal,” says John Moult, a compu- last CASP — in 2018, the first year that
DeepMind has made a gargantuan tational biologist at the University of Maryland London-based DeepMind participated. But,
leap in solving one of biology’s grand- in College Park, who co-founded CASP in 1994 this year, the outfit’s deep-learning net-
est challenges — determining a pro- to improve computational methods for accu- work was head-and-shoulders above other
tein’s 3D shape from its amino-acid sequence. rately predicting protein structures. “In some teams and, say scientists, performed so
DeepMind’s program, called AlphaFold, sense the problem is solved.” mind-bogglingly well that it could herald a
outperformed around 100 other teams The ability to accurately predict proteins’ revolution in biology.
in a biennial protein-structure prediction structures from their amino-acid sequences “It’s a game changer,” says Andrei Lupas, an
challenge called CASP, short for Critical would be a huge boon to life sciences and evolutionary biologist at the Max Planck Insti-
Assessment of Structure Prediction. The medicine. It would vastly accelerate efforts tute for Developmental Biology in Tübingen,
COVID VACCINES:
immunity to the SARS-CoV-2 virus will last, and
researchers will need to monitor this closely
WHAT SCIENTISTS
in the coming months and years.
There have been some reports that peo-
W
Food and Drug Administration. The trial has And vaccines, Altmann says, are deliberately
so far gathered data from only 170 cases of designed to provoke strong responses from
ith striking speed, the United COVID-19 across its control and intervention the immune system.
Kingdom has become the first arms, and real-world efficacy might be lower Still, it will be important for public-health
country to approve a COVID-19 than in a trial, but it is still an extraordinarily officials to monitor immunity — and to know
vaccine that has been tested in a promising result, says immunologist Danny when it begins to wane. One way to do that,
large clinical trial. On 2 Decem- Altmann at Imperial College London: “This is in addition to keeping track of infections
ber, UK regulators granted emergency-use brilliant news.” among people who have received the shots,
authorization to a vaccine from drug firms The approval is a historic moment. But is to assess their levels of antibodies and
Pfizer and BioNTech, just seven months after scientists still have many questions about how immune cells periodically. Tracking how
the start of clinical trials. Hospitals have this and other vaccines will perform as they’re these immune responses change could give
already administered the first doses; front- rolled out to millions of people. an early indication of when they are waning to
line health-care workers, care-home staff and worrisome levels, says Altmann. But the wide
residents are at the head of the queue. Do the vaccines prevent variation in people’s immune responses could
China and Russia have approved vaccines transmission of SARS-CoV-2? make it a challenge to understand the circum-
already, but without waiting for the immuni- In addition to the Pfizer vaccine, regulators stances in which a vaccine doesn’t work, and
zations to complete the final round of tests in are poring over data from a similar vaccine such studies will need to track many people.
people. Regulators in the United States and the made by Moderna of Cambridge, Massachu- “You need to have a good stab at some high-
European Union are expected to issue their setts, and a third produced by AstraZeneca of level population analysis to work out whether
decisions on the Pfizer vaccine in the coming Cambridge, UK, and the University of Oxford, you’re winning or losing,” says Altmann. “Oth-
weeks. UK. All three have been tested in large clinical erwise, you might be a government kidding
Tests on more than 43,000 people have trials, and have shown promise in preventing yourself in years’ time.”
shown that it is 95% effective at preventing disease symptoms.
disease when measured a week after partic- But none has demonstrated that it prevents How well do the vaccines work in
ipants are given their second dose, the New infection altogether, or reduces the spread of older people and other groups?
York City-based firm said in November when it the virus in a population. This leaves open the The major vaccine trials so far have enrolled
and BioNTech, in Mainz, Germany, submitted chance that those who are vaccinated could tens of thousands of people, but for each one,
CLIMATE AGENDA?
might be more effective for preventing disease
symptoms developing. But there are subtle dif-
ferences in the immune responses provoked
by each approach, notes Griffin. Research-
ers might eventually find that one approach The US president-elect faces an uphill battle, but
works better than another in certain groups
of people, or that one is the best at limiting there are levers he can pull to curb global warming.
transmission.
By Jeff Tollefson
W
Differences in costs and logistics will also global warming is still a partisan issue on
shape which vaccine is best for which region. Capitol Hill, and “that is going to limit what
Shortly after the UK government announced hen Joe Biden won the US presi- Biden can accomplish”.
the authorization of the Pfizer vaccine, offi- dency last month, it seemed like Biden’s election comes at a crucial juncture.
cials acknowledged that getting the vaccine a huge opportunity to restore the President Donald Trump pulled the United
to residents in individual care homes would country’s position as a leader in States out of the Paris climate agreement last
be a challenge, because it needs to be stored the fight against climate change. month, but other players on the world stage,
at extremely low temperatures (−70 °C). The But whether he’ll be able to deliver on his from China to the European Union, are prepar-
other two vaccines do not need to be kept at aggressive climate agenda remains to be ing to present a new round of commitments
such low temperatures, and the AstraZeneca seen, especially because he will face a powerful at the United Nations climate conference in
immunization is likely to be the easiest and Republican opposition in Congress. Glasgow, UK, next year.
cheapest to store, says Head. Still, climate-policy experts say that there is Having the United States back on board
Comparisons between the effectiveness a lot the former senator and vice-president to will give an important boost to these negoti-
of the different vaccines are important and Barack Obama can do, including exerting his ations, says Jean-Pascal van Ypersele, a clima-
should be done, but until then, the path for- authority over federal agencies and leveraging tologist at the Catholic University of Louvain
ward is clear, says Altmann. “Grab any vaccine his experience working with both parties in the in Louvain-la-Neuve, Belgium, and former
that your government can buy,” he says. Senate to push legislation in Congress. vice-chair of the Intergovernmental Panel on
“This is really the first time that a US presi- Climate Change. “The stars are much better
Could the virus evolve to evade dent is leading with climate,” says Vicki Arroyo, aligned for a successful outcome in Glasgow
immunity given by vaccines? executive director of Georgetown University’s than they would have been if Trump had been
Some viruses, such as the influenza virus, Climate Center in Washington DC. That’s excit- re-elected.”
are notorious for mutating. The SARS-CoV-2 ing, she says, but suggests cautious optimism: Biden’s first opportunity to advance his
MARKO KONIG/IMAGEBROKER/SHUTTERSTOCK
Also on the team is Peter Daszak, president of
the non-profit research organization Ecohealth
Alliance in New York City, who has spent more
than a decade studying coronaviruses. He has
worked closely with the Wuhan Institute of
Virology (WIV) to test bats for coronaviruses
with the potential to spill over into people.
“It is an honour to be part of this team,” says
Daszak. “There hasn’t been a pandemic on this
scale since the 1918 flu, and we’re still close
SARS-CoV-2 probably originated in bats, but how it passed to people is being investigated. enough to the origin to really find out more
details about where it has come from.”
THE SCIENTISTS
Another team member, Fabian Leendertz,
a veterinary researcher at the Robert Koch
INVESTIGATING THE
Institute in Berlin, will bring his expertise in
spillover events. In April 2014, Leendertz vis-
PANDEMIC’S ORIGINS
ited Meliandou village in Guinea, months after
a two-year-old died of Ebola — the first person
reported to be infected in West Africa.
Work by Leendertz, including interviews
The World Health Organization will draw on a diverse with locals and environmental sampling,
suggests that the outbreak started in bats that
team to examine a major mystery about SARS-CoV-2. lived in a hollow tree where the children used
to play. The tree was burnt down days before
By Smriti Mallapaty
A
SARS-CoV-2 was first identified — and expand his arrival and no Ebola virus was detected in
across China and beyond. nearby bats, which he says highlights the dif-
n epidemiologist who helped to tie The international group comes with a ficulties of finding an outbreak’s beginnings.
the 2012 outbreak of Middle East res- breadth of knowledge. Marion Koopmans is a Considerable time has passed since the
piratory syndrome (MERS) to camels; virologist specializing in molecular epidemiol- emergence of COVID-19, and many people have
a food-safety officer who studies how ogy at the Erasmus University Medical Centre only mild or no symptoms, which will make it
pathogens spread in markets; and a in Rotterdam, the Netherlands. She was on challenging to identify the first infected per-
veterinarian who found evidence linking the team that found, in 2013, that dromedary son, says Leendertz.
the 2014 West Africa Ebola outbreak to bats camels were an intermediate host for the virus Other team members include researchers
roosting in a hollow tree. These researchers that causes MERS, which has killed more than from Denmark, the United Kingdom, Australia,
are among the team that the World Health 850 people. She has since worked with another Russia and Japan.
Organization (WHO) has assembled to explore team member — Elmoubasher Farag, an epi- Although the team members are highly
the origins of the coronavirus pandemic. demiologist at the Ministry of Public Health qualified, eight out of ten are men and inves-
The investigation aims to find out how in Doha — to test camels for MERS antibodies. tigators from Europe dominate the group;
and when the virus SARS-CoV-2 first infected During the COVID-19 pandemic, Koopmans none is from Africa or South America, says
people. Strong evidence suggests that the has tracked the rapid spread of SARS-CoV-2 Angela Rasmussen, a virologist at Georgetown
coronavirus originated in bats, but its jour- in mink farms in Europe. Studies on the pan- University, who is based in Seattle, Washing-
ney to people remains a mystery. Scientists demic’s origin will need to explore the role of ton. “It could be more representative of the
say the team is highly qualified, but its task will animals kept for fur and food, she says. larger global scientific community,” she says.
be challenging. Koopmans says that the group is keeping an She also says that Daszak’s ties to the WIV
“This is an excellent team with a lot of expe- open mind about how the pandemic started could raise a conflict of interest, given the
rience,” says Martin Beer, a virologist at the and will not exclude any scenarios, including unsubstantiated claims that the virus acci-
Federal Research Institute for Animal Health the unlikely one that SARS-CoV-2 accidentally dentally leaked from the lab.
in Greifswald, Germany. escaped from a laboratory. Scientists have pre- Daszak says that he has been transparent
The group will be working with research- viously told Nature that the virus is likely to about his work in China. The trust he has built
ers in China and professionals from several have passed from bats to humans, probably with researchers there will help the team to
other international agencies, and will start through an intermediate animal — but ruling gain a deeper understanding of the pandemic’s
the search in Wuhan — the Chinese city where out the lab scenario will be difficult. “Anything early days, he says.
R
previous work had shown that if the genes are
present in extra copies or expressed for too analysis mandatory in the research it
esearchers have restored vision in old long, some mice will develop tumours. funds through its €85-billion (US$100-
mice and in mice with damaged retinal In Sinclair’s lab, geneticist Yuancheng Lu billion) Horizon Europe programme.
nerves by resetting some of the thou- looked for a safer approach. He dropped one of The strengthened policy is a result of
sands of chemical marks that accu- the four genes used by Belmonte’s team — one recommendations made in a report (see
mulate on DNA as cells age. The work, that is linked to cancer — and put the remaining go.nature.com/3mryv1a), produced last
published on 2 December in Nature, suggests a three into a virus that could shuttle them into month by an expert group chaired by
new approach to reversing age-related decline, cells. He included a switch that would allow Londa Schiebinger, who studies gender
by reprogramming some cells to a ‘younger’ him to turn the genes on by giving mice water and science at Stanford University in
state in which they are better able to repair or spiked with a drug. Withholding the drug California. Nature spoke to Schiebinger
replace damaged tissue. would switch the genes off again. about the group’s work.
“It is a major landmark,” says Juan Carlos Because mammals lose the ability to
Izpisua Belmonte, a developmental biologist regenerate components of the central nerv- How do you convince people of the need
at the Salk Institute for Biological Studies in La ous system early in development, Lu and his for sex and gender analysis in research?
Jolla, California, who was not involved in the colleagues tested their approach there — in Our iconic example of failure when you
study. “These results clearly show that tissue the eye’s retinal nerves. They first injected the don’t do this analysis is that between 1997
regeneration in mammals can be enhanced.” virus into the eye to see whether expression of and 2001, ten prescription drugs were
the three genes would allow mice to regenerate withdrawn from the US market, eight of
Visionary approach injured nerves — something that no treatment which were more dangerous for women
Ageing affects the body in myriad ways — had yet been shown to do. than for men. When drugs fail, you’re losing
among them, adding, removing or altering Lu remembers the first time that he saw a money and people are suffering and dying.
chemical groups such as methyls on DNA. nerve regenerating from injured eye cells. “It From preclinical studies to human clinical
These ‘epigenetic’ changes accumulate as a was breathtaking,” he says. trials, you have to collect data on males
person ages, and some researchers have pro- and females and analyse them separately.
posed tracking the changes as a way of calibrat- “If epigenetic changes are
ing a molecular clock to measure biological What mistakes do researchers make in
age, an assessment that takes into account
a driver of ageing, can you these analyses?
biological wear-and-tear and can differ from reset the epigenome? Can The biggest mistake is simply ignoring sex,
chronological age. you reverse the clock?” gender and intersectionality. Another is to
“We set out with a question: if epigenetic not distinguish between biological sex and
changes are a driver of ageing, can you reset sociocultural gender. Gender is specific
the epigenome?” says David Sinclair, a genet- The team went on to show that its system to ethnicity, age and culture. Researchers
icist at Harvard Medical School in Boston, improved visual acuity in mice with age-related need to get the right variables, collect their
Massachusetts, and a co-author of the Nature vision loss, or with increased pressure inside data correctly and do the analysis well.
study (Y. Lu et al. Nature 588, 124–129; 2020). the eye — a hallmark of the disease glaucoma.
“Can you reverse the clock?” The approach also reset epigenetic patterns Are there research areas where people
There were suggestions that the approach to a more youthful state in mice and in human might be surprised that sex and gender
could work: in 2016, Belmonte and his col- cells grown in the laboratory. It is still unclear analysis is essential?
leagues reported the effects of expressing how cells preserve a memory of a more youth- For some marine organisms, sex is
four genes in mice genetically engineered ful epigenetic state, says Sinclair, but he and determined by temperature. Our report
to age more rapidly than normal (A. Ocampo his colleagues are trying to find out. includes a fascinating study from Australia,
et al. Cell 167, 1719–1733; 2016). It was already In the meantime, Harvard has licensed where they found that the turtles in the
known that triggering these genes could cause the technology to Boston company Life north of the Great Barrier Reef were 99%
cells to lose their developmental identity — Biosciences, which, Sinclair says, is carrying female, whereas in the cooler south, it was
the features that make, for example, a skin cell out preclinical safety assessments with a view about 67% female. It’s important that we
look and behave like a skin cell. But rather than to developing it for use in people. It would be understand how global warming is skewing
turn the genes on and leave them that way, an innovative approach to treating vision loss, these ratios, so that we can efficiently
Belmonte’s team turned them on for only a few says Botond Roska, director of the Institute manage ecosystems.
days, then switched them off again. The result of Molecular and Clinical Ophthalmology in
was mice that aged more slowly, and had a pat- Basel, Switzerland, but will probably need con- Interview by Elizabeth Gibney.
tern of epigenetic marks indicative of younger siderable refinement before it can be deployed Edited for length and clarity.
animals. But the technique had disadvantages: safely in humans.
O
n 18 February next year, a NASA on Perseverance, John Sutherland will be pay- in the ocean, recent research suggests that the
spacecraft will plummet through ing particularly close attention. Sutherland, a key molecules of life, and its core processes,
the Martian atmosphere, fire its biochemist at the MRC Laboratory of Molecu- can form only in places such as Jezero — a rel-
retro-rockets to break its fall and lar Biology in Cambridge, UK, was one of the atively shallow body of water fed by streams.
then lower a six-wheeled rover scientists who lobbied NASA to visit Jezero That’s because several studies suggest
named Perseverance to the sur- Crater, because it fits his ideas about where life that the basic chemicals of life require ultra-
face. If all goes according to plan, might have originated — on Mars and on Earth. violet radiation from sunlight to form, and
the mission will land in Jezero Cra- The choice of landing site reflects a shift in that the watery environment had to become
ter, a 45-kilometre-wide gash near the planet’s thinking about the chemical steps that trans- highly concentrated or even dry out com-
equator that might once have held a lake of formed a few molecules into the first biologi- pletely at times. In laboratory experiments,
liquid water. cal cells. Although many scientists have long Sutherland and other scientists have produced
Among the throngs of earthlings cheering speculated that those pioneering cells arose DNA, proteins and other core components of
Implosion of a billion-euro
they were wired. Anyone can repair a broken
watch by putting its known components in the
right places, neuroscientist Zachary Mainen at
I
pattern was right or wrong?”
When Hutton visited the next year, Markram
n October 2013, I attended the launch of the across Europe. Yet aspects of what went so ticked him off for contacting critics without
Human Brain Project in Lausanne, Switzer- expensively wrong still remain elusive. informing him. The commission was decid-
land, as correspondent for Nature. I hoped In Silico is more about the back story of ing which two projects would become its bil-
to leave with a better understanding of the the Human Brain Project (HBP). Hutton was lion-euro Future and Emerging Technologies
exact mission of the baffling billion-euro 22 years old when he watched a 2009 talk Flagships and Markram didn’t want any con-
enterprise, but I was frustrated. Things became by Henry Markram, the controversial figure troversy to upset his chances.
clear the following year, when the project fell who later became the first director of the HBP. The film suggests (as other commentators
spectacularly, and very publicly, apart. Markham was speaking about the Blue Brain have) that Markram saw the flagship pro-
Noah Hutton’s documentary In Silico cap- Project, a major initiative he had launched a gramme as a means to expand Blue Brain. But
tures a sense of what it was like behind the few years before at one of Europe’s top univer- to win the money, it had to be more than that.
scenes of the project, which was supported sities, the Swiss Federal Institute of Technology He had to team up with top scientists in other
with great fanfare by the European Commis- in Lausanne, with generous funding from the European Union countries to present an inter-
sion. It had been hyped as a quantum leap in Swiss government. He claimed that he would disciplinary collaboration. He persuaded some
understanding how the human brain works. — with the help of a supercomputer related to initially sceptical cognitive neuroscientists
Instead, it left a trail of angry neuroscientists the one that beat world chess champion Garry to join. Their job, it was understood, would
Internal tensions
What Is a Complex System? Early optimism is quickly strained, as project
James Ladyman & Karoline Wiesner Yale Univ. Press (2020) members are sidelined. Hutton returns to find
The Santa Fe Institute in New Mexico inaugurated the study of complex that just nine months after the launch, Mainen
systems, but its founding workshops in 1984 did not define the topic. and some colleagues had written a public let-
Even today there is no agreement on a definition, nor whether one ter calling on the commission to rethink the
is possible, remark philosopher of science James Ladyman and project, claiming that autocratic management
mathematician Karoline Wiesner. After a clear analysis of systems was distorting its mission. The letter attracted
ranging from radiation to human brains, they conclude: there is no around 800 signatories from neuroscientists
“single natural phenomenon of complexity”, but ‘complexity science’ globally. (Two years later, they set out an alter-
does exist, rather than being “merely branches of different sciences”. native approach in this journal: Z. F. Mainen
et al. Nature 539, 159–161; 2016).
By 2016, Markram had been removed from
A Manual of the Mammalia the leadership (see Nature https://doi.org/
Douglas A. Kelt & James L. Patton Univ. Chicago Press (2020) fkgx; 2015). The final two years of filming fol-
The subtitle of this comprehensive, lavishly illustrated reference book low him back on Blue Brain. The simulation
terms it “an homage” to Timothy Lawlor’s acclaimed Handbook to the progresses, the 3D visualizations get more
Orders and Families of Living Mammals, which was published in 1979, impressive, research papers emerge — but the
revised, but out of date following Lawlor’s death in 2011. As wildlife project’s pep seems to drain away. Markram’s
ecologist Douglas Kelt and mammal curator James Patton note, insistence that a complete brain simulation is
Lawlor’s final edition featured about 4,170 species of mammal; today’s still just ten years away sounds hollow. Mean-
figure is 6,495. “Do not be overwhelmed”, they advise students, while, the HBP continues with a more distrib-
“simply revel in the diversity that is the Mammalia.” Andrew Robinson uted, democratic structure.
In Silico is a fascinating window into the
trouble grandiose research projects and
Yellowstone Wolves grandiose personalities can generate, even if
Eds Douglas W. Smith et al. Univ. Chicago Press (2020) it fails to get to the heart of what specifically
Twenty-five years ago, the authors reintroduced wolves to Yellowstone went wrong with the HBP. Hutton hints that the
National Park in Wyoming — the first deliberate return of an apex disputes were driven by money. I disagree; my
carnivore to a big ecosystem. Here, they relate what they’ve learnt of sense is that it came down to leadership style
the animals’ predation, mating, play, genetics, disease and more, and and irresolvable differences in scientific opin-
their impact on other species and the landscape. Also detailed are the ion. There is a bolder, even more interesting,
fraught history, politics and implications of rewilding. Glorious pictures story waiting to be told.
bear witness to fragile gains. US President Donald Trump’s silver-
anniversary gift? Rolling back protections on the wolves. Sara Abdulla Alison Abbott writes from Munich, Germany.
e-mail: alison.abbott.consultant@
springernature.com
Comment
BUDA MENDES/GETTY
Smoke rises from a fire in Brazil’s Pantanal, the world’s largest tropical wetland, in September.
B
Climate extremes, poor razil has changed. As well as the much private land. Conservation areas such
COVID-19 pandemic killing more than as Encontro das Águas State Park have been
management and lax laws 170,000 of its citizens so far, 2020 has devastated — it contained one of the largest
are making this World seen almost one-third of the Pantanal, populations of jaguars in the world.
the largest tropical wetland in the Fires’ impacts have been felt nationwide.
Heritage Site prone to world, on fire. Four million hectares of for- Smoke has spread thousands of kilometres,
fierce fires. Researchers and est, savannah and shrub-land (an area bigger reducing air quality in São Paulo, Rio de
governments must develop than the US state of Maryland) have gone up Janeiro and Curitiba. Southern states have
in flames since January (see go.nature.com/ experienced showers of black rain. The fires
a plan to manage these risks 2jtw6va). Almost all the Indigenous territories are decimating Brazil’s economy, curbing
together. and conservation facilities were burnt, as was inward investment as well as sectors such as
SOURCE: LABORATORY FOR ENVIRONMENTAL SATELLITE APPLICATIONS, FED. UNIV. RIO DE JANEIRO
cattle on native pastures and moving animals
to higher land when lowlands flood. Tourists
flock to the region for its spectacular scenery,
safaris and sport-fishing.
Each rainy season, from October to April, PARAGUAY
pulses of floods swell the Paraguay River to sup-
port ecosystems found nowhere else on Earth. Fire danger ratings are rising
Endangered jaguar, giant otter, marsh deer as the region warms. 2020
saw the worst conditions in
and hyacinth macaws roam wild. Thousands three decades.
of birds pass through on their migrations1. Indigenous Kadiwéu
Difficulty of 9.9 people are trained
It’s a haven for caimans, capybaras, monkeys,
controlling fires to fight fires in their
deer, coatis, tapirs, snakes and the jabiru stork (DSR index*) territory.
( Jabiru mycteria) — the region’s symbol.
The fires have affected all aspects of life.
COVID-19 has made things worse. PREVFOGO, Conservation Indigenous
areas territories
the national centre for forest fire prevention
4 Fires
and fighting, has struggled to hire and train
2019 2020 Both years
firefighters. Many fires broke out in remote
regions, even underground, that were hard 1980 1990 2000 2010 2020
to reach. Local firefighters in the Kadiwéu ter- *Averaged daily severity rating (DSR) from January to
August of each year for the Pantanal biome.
ritory, for example, struggled almost alone
Correspondence
Another diversity Land use predicts What counts as Combine resilience
problem — scientists’ pandemic disparities climate finance? and efficiency in
politics Define urgently post-COVID societies
COVID-19 morbidity is
According to your poll before linked to social, economic To resolve arguments over As countries prepare to remodel
the US presidential election and environmental factors, what funding actually flows themselves after the COVID-19
(see Nature 586, 654; 2020), the including residential from developed to developing pandemic, they must tackle
political leaning of scientists location, air pollution and nations, the United Nations growth and development
was 86% in favour of Democrat median household income Framework Convention on expectations by using resources
Joe Biden, now president- (H. A. Washington Nature 581, Climate Change needs to more sustainably, and by
elect, with just 8% supporting 241; 2020). These have an draw up a definition of what ensuring that their societies are
Republican Donald Trump, the overlapping determinant that constitutes climate finance. better placed to weather future
outgoing president. However, could prove to be an important At the 2009 UN climate disruptions.
this finding is glaringly out predictor of COVID‑19 summit, developed countries The COVID-19 experience
of step with the voting of the disparities: land use. pledged to mobilize US$100 indicates that society could
population from which the US The United States has a billion annually by 2020 to become more vulnerable to
scientists were drawn (about 51% strained history of land use and help developing countries systemic shocks and cascading
versus 47%, respectively). land governance, including mitigate and adapt to climate disruption if the practices on
This misalignment could ethnic constraints on land change. Has the promise which it depends excessively
be attributed to differences in ownership and unfair mortgage- been met? The answer to this prioritize system efficiency
education, understanding and lending practices. Decisions question will be available only over resilience. Efficiency
awareness of the issues at stake. on land-use classification in “the first quarter of 2022 at emphasizes performance
But such a gulf risks isolating have led to hazardous and the earliest”, according to a at maximum capacity with
science further from society at a polluting facilities being report published last month minimal use of scarce resources.
time when we should be building sited next to minority and (go.nature.com/2kdeklu) by To meet the rising demands
bridges beyond this election. other vulnerable residential the Organisation for Economic of society, efficiency-based
As academics become more communities. Despite policies Co-operation and Development approaches often rely on
aware of the importance of enacted in 1968 to protect (OECD), a club of wealthy increasingly complex and
diversity of thought, we must be against housing discrimination countries. interconnected systems. But
careful not to recreate different (go.nature.com/39v1bt3), the Letting the OECD decide what when a tightly interdependent
forms of the old elitist patterns United States is witnessing counts as climate finance on the society encounters acute or
of collective behaviour recently a correlation of historical world’s behalf risks introducing chronic stressors beyond its
challenged by anti-racism. Any ‘redlining’ — the systematic questionable accounting expectations or operating
association of science with denial of services to residents practices (see R. Weikmans and capabilities, such highly
political archetypes could turn of certain areas, on the basis J. T. Roberts Clim. Dev. 11, 97–111; efficient systems are prone to
some against it by enhancing of race or ethnicity — with 2019). The OECD, for example, catastrophic failure that can
the view that it is an exclusive COVID‑19 incidence today. continues to account loans delay or prevent recovery.
pursuit. It is crucial that land-use at face value, which equates a More-resilient systems
practices are considered $10‑million loan (which has to might be less efficient, but they
Andrew Isaac Meso King’s when making public-health be paid back) to a $10‑million recover better from systemic
College London, UK. management decisions. This grant. It is therefore no surprise disruptions. Building resilience
andrew.meso@kcl.ac.uk could help to mitigate the multi- that developing countries does not mean abandoning
generational, compounding have found OECD reports efficiency, but rather maximizing
impacts of isolated or confined unacceptable before (see Nature socio-economic systems’ long-
residential spaces. Those who 573, 328–331; 2019). term sustainability in the face
live in such areas will continue of future disruptions. Marrying
to take a disproportionate hit Romain Weikmans Free University resilience with efficiency would
unless land-use equity is made a of Brussels, Belgium. allow society to preserve or
priority in governance. romain.weikmans@ulb.be even improve living standards in
current and future crises.
Cesunica Ivey University of J. Timmons Roberts Brown
California, Riverside, USA. University, Providence, Rhode Benjamin D. Trump, Igor Linkov
cesunica@ucr.edu Island, USA. US Army Corps of Engineers,
Boston, Massachusetts, USA.
Stacy-ann Robinson Colby igor.linkov@usace.army.mil
College, Waterville, Maine, USA.
William Hynes OECD, Paris,
France.
from the age of dinosaurs shape previously unknown for any bird from
the age of the dinosaurs.
The exceptional degree of preservation of
Falcatakely enabled the authors to make other
Daniel J. Field
astonishing findings. Imaging using a method
The fossil record traces the origin of the modern bird skull called high-resolution microc omputed
as birds evolved from their dinosaurian ancestors. Now the tomography enabled them to digitally ‘extract’
the fragile skull bones from the surrounding
discovery of a bizarre fossil reveals a surprising diversion rock. O’Connor and colleagues could then
during this process of facial transformation. See p.272 reassemble the delicate components of the
bill, including elements such as the paper-
thin palate bones, which are rarely found
As living dinosaurs, birds are the product of along its jaws. By contrast, the closest relatives preserved, into a compelling 3D model (see
a long and complex evolutionary history that of modern birds from the time of the dinosaurs Supplementary Videos 1–8 of ref. 3).
has given rise to more than 11,000 living spe- show the opposite pattern, with teeth found Studying the palate, the authors spotted a
cies1. The past decade has witnessed a surge throughout the jaws, but none at the tip of surprising bone called the ectopterygoid. This
of interest in the evolution of the avian skull the beak (Fig. 1)4. These features give the skull is absent in living birds, but is a component of
— a structure that is hugely variable across the of Falcatakely an almost comical profile — the palate of non-avian dinosaurs and early
diversity of living birds2. However, our abil- imagine a creature resembling a tiny, buck- bird-like forms, such as the iconic early birds
ity to test hypotheses of how and when key toothed toucan flitting from branch to branch, Archaeopteryx and Sapeornis8. However, on the
transformations of the bird skull took place occasionally glancing down at Madagascar’s basis of detailed analyses, O’Connor et al. infer
is limited if we can’t incorporate fossils into formidable Late Cretaceous inhabitants, which that Falcatakely belongs to a group of Meso-
evolutionary models. On page 272, O’Connor included equally bizarre mammals5 and giant zoic ‘pre-modern’ birds called Enantiornithes
et al.3 report a stunning fossil-bird discovery
from the age of the dinosaurs that reminds us
of the crucial value of fossils for casting light Falcatakely Ichthyornis Asteriornis
on unexpected complexities in avian evolu- Lacrimal Nasal
tionary history.
This striking addition to the aviary of
the Mesozoic era is between 72 million and
66 million years old (corresponding to the
latest stage of the Cretaceous period). It Maxilla Premaxilla
(upper jaw) (upper beak)
comes from Madagascar, and is named
Falcatakely forsterae, which roughly translates Towards
as Forster’s small scythe beak. The name refer- living birds
ences the distinctive shape of the fossil’s bill
and honours Catherine Forster’s numerous
contributions to vertebrate palaeontology in
Madagascar. The specimen is small (less than
9 centimetres long) and delicate (paper thin
in places), yet the stunning bone preservation
provides a spectacular look at this ancient
creature’s anatomy. Figure 1 | The evolution of ancient bird skulls. Discoveries of bird skulls from the Mesozoic era (the age of
Although the fossil consists of only the the dinosaurs) have revealed both how the skull of modern birds arose and the surprising variability of these
ancient skulls (as illustrated by these fossils, reported between 2018 and 2020). O’Connor et al.3 present
front half of a skull, it’s clear that Falcatakely
their discovery of the skull of a bird specimen they name Falcatakely forsterae, which shows an unusually
is more than just a pretty face. The skull is
deep and elongated snout, with teeth (at least one tooth and possibly more) positioned only at the very tip
utterly bizarre, characterized by a deep and
of the upper jaw in a skull region called the premaxilla. Like other distant relatives of modern birds, such
elongated snout (Fig. 1) unlike those seen in any as non-avian dinosaurs, the upper jaw of Falcatakely consists mainly of a region called the maxilla. Closer
other Mesozoic birds. The skull’s architecture relatives of modern birds, such as Ichthyornis4, had teeth throughout the jaws, except at the tip, and retained
becomes even weirder. The very tip of its snout the ancestrally large maxilla. Early modern birds, including Asteriornis (an ancient relative of chickens and
has one small preserved tooth (the tip possibly ducks)13, lost their teeth completely, and had upper jaws dominated by the premaxilla. Nasal bones are
had more teeth that were not preserved); how- shown in grey and lacrimal bones (inferred for Asteriornis) are in beige. (Figure adapted from Fig. 2 of ref. 3,
ever, there are clearly no teeth anywhere else Fig. 3 of ref. 4 and Fig. 1 of ref. 13.)
by a ‘source’ produced by the collision — a be calculated from first principles so that the Manuel Lorenz is at the Institute for Nuclear
volume of space in which quarks and gluons results can be compared with experimental Physics, Goethe University, Frankfurt 60438,
that originally came from the protons interact findings. The precision with which nucleon– Germany.
and become confined within new hadrons. The nucleon interactions can be determined from e-mail: m.lorenz@gsi.de
source emits various types of hadron, includ- experimental data is still superior to that
ing protons and hyperons, some of which form obtained from these calculations, but the 1. ALICE Collaboration. Nature 588, 232–238 (2020).
proton–hyperon pairs. Finally, the proton and ALICE Collaboration’s measurements of the 2. Epelbaum, E., Hammer, H.-W. & Meissner, U.-G. Rev. Mod.
Phys. 81, 1773–1825 (2009).
hyperon in each of these pairs interact with proton–hyperon interactions almost exactly 3. Stoks, V. & de Swart, J. Phys. Rev. C 47, 761–767 (1993).
each other in ways that alter the momentum match those obtained from theory. 4. Weissenborn, S., Chatterjee, D. & Schaffner-Bielich, J.
of the paired system. This momentum is meas- A wealth of high-precision measurements of Phys. Rev. C 85, 065802 (2012).
5. Tanabashi, M. et al. Phys. Rev. D 98, 030001 (2018).
ured by a detector and used to determine the proton–hyperon interactions is expected from 6. Lisa, M. A., Pratt, S., Soltz, R. & Wiedemann, U. Annu. Rev.
momentum correlations. the LHC in the next decade, following on from Nucl. Part. Sci. 55, 357–402 (2005).
The momentum correlations reflect the size its recent upgrade. Moreover, various other 7. Adamczewski-Musch, J. et al. Phys. Rev. C 94, 025201 (2016).
8. Acharya, S. et al. Phys. Rev. C 99, 024001 (2019).
of the hadron source and the properties of the facilities that will study particle collisions at 9. Sasaki, K. et al. Nucl. Phys. A 998, 121737 (2020).
interaction between the produced proton– lower energies than those produced at the 10. Iritani, T. et al. Phys. Lett. B 792, 284–289 (2019).
hyperon pairs. Such correlation analyses were
originally used to determine the source size in
Virology
collisions of heavy ions6, but in the new work,
by a ‘source’ produced by the collision — a be calculated from first principles so that the Manuel Lorenz is at the Institute for Nuclear
volume of space in which quarks and gluons results can be compared with experimental Physics, Goethe University, Frankfurt 60438,
that originally came from the protons interact findings. The precision with which nucleon– Germany.
and become confined within new hadrons. The nucleon interactions can be determined from e-mail: m.lorenz@gsi.de
source emits various types of hadron, includ- experimental data is still superior to that
ing protons and hyperons, some of which form obtained from these calculations, but the 1. ALICE Collaboration. Nature 588, 232–238 (2020).
proton–hyperon pairs. Finally, the proton and ALICE Collaboration’s measurements of the 2. Epelbaum, E., Hammer, H.-W. & Meissner, U.-G. Rev. Mod.
Phys. 81, 1773–1825 (2009).
hyperon in each of these pairs interact with proton–hyperon interactions almost exactly 3. Stoks, V. & de Swart, J. Phys. Rev. C 47, 761–767 (1993).
each other in ways that alter the momentum match those obtained from theory. 4. Weissenborn, S., Chatterjee, D. & Schaffner-Bielich, J.
of the paired system. This momentum is meas- A wealth of high-precision measurements of Phys. Rev. C 85, 065802 (2012).
5. Tanabashi, M. et al. Phys. Rev. D 98, 030001 (2018).
ured by a detector and used to determine the proton–hyperon interactions is expected from 6. Lisa, M. A., Pratt, S., Soltz, R. & Wiedemann, U. Annu. Rev.
momentum correlations. the LHC in the next decade, following on from Nucl. Part. Sci. 55, 357–402 (2005).
The momentum correlations reflect the size its recent upgrade. Moreover, various other 7. Adamczewski-Musch, J. et al. Phys. Rev. C 94, 025201 (2016).
8. Acharya, S. et al. Phys. Rev. C 99, 024001 (2019).
of the hadron source and the properties of the facilities that will study particle collisions at 9. Sasaki, K. et al. Nucl. Phys. A 998, 121737 (2020).
interaction between the produced proton– lower energies than those produced at the 10. Iritani, T. et al. Phys. Lett. B 792, 284–289 (2019).
hyperon pairs. Such correlation analyses were
originally used to determine the source size in
Virology
collisions of heavy ions6, but in the new work,
nature
communications
eROSITA8 is a large-collecting-area and wide-field-of-view X-ray tel- Although less evident at first glance, close inspection of the
escope, launched into space onboard the Spektr-RG mission on 13 medium-energy-band (0.6–1.0 keV) image in the hemisphere below the
July 2019. Over the course of six months (December 2019–June 2020), plane of the Milky Way (‘south’) reveals an astonishing new feature—a
Spektr-RG and eROSITA have completed a survey of the whole sky at huge circular annulus of similar shape and scale to the structure seen
energies of 0.2–8 keV—much deeper than the only other all-sky survey in the north (Fig. 2). Together, they seem to form a pair of ‘bubbles’ that
with an X-ray-imaging telescope, which was performed by ROSAT in emerge from the Galactic centre. They are traceable at various levels
1990 at energies of 0.1–2.4 keV. of intensity throughout most of the sky, and should represent a very
The sky map from the first eROSITA all-sky survey is shown in Fig. 1. large object (several kiloparsecs), akin to the Fermi bubbles, because
This image has been created from calibrated events in the energy range local features are unlikely to exhibit the fourfold symmetry around the
0.3–2.3 keV (Methods). A preliminary analysis indicates that more than direction towards the centre of the Galaxy.
one million X-ray point sources and about 20,000 extended ones are The Fermi bubbles were discovered in 20101 with the Fermi-LAT
detected in the survey. This is comparable to, and may exceed, the total (Fermi large-area telescope) γ-ray instrument. They have a hard,
number of X-ray sources known before eROSITA launched. Multiwave- non-thermal spectrum, which shows up clearly in maps at energies of
length identifications using the WISE and Gaia catalogues9,10 suggest more than 1 GeV. Their emission is probably due to inverse Compton
that about 80% of the point sources are distant active galactic nuclei scattering of cosmic-ray electrons on the cosmic microwave back-
(AGN; comprising about 80% of all known blazars) and that around ground and other radiation fields. This kiloparsec-scale structure was
20% are coronally active stars in the Milky Way, including about 150 quickly interpreted as a possible manifestation of past activity of the
planet-hosting stars (roughly 10% of all known outside of the Kepler now dormant supermassive black hole in the centre of the Milky Way,
field). thus linking it with AGN observed outside the Galaxy12–15. Alternatively,
Various very large and diffuse extended structures are visible in a burst of star formation could power the bubbles16–18. In either case,
the all-sky survey map. The most obvious is a quasi-circular feature, the energy needed to power their formation must have been very large,
which is part of the North Polar Spur and Loop I (northwest quadrant) at roughly 1055 erg1,19.
discovered in the early days of X-ray and radio astronomy, respec- X-ray emission from the North Polar Spur had already been found
tively5,11. by ROSAT5. Although considered in most early models to be a nearby
1
Max-Planck-Institut für Extraterrestrische Physik, Garching, Germany. 2Space Research Institute of the Russian Academy of Sciences, Moscow, Russia. 3Max-Planck-Institut für Astrophysik,
Garching, Germany. 4Max-Planck-Institut für Radioastronomie, Bonn, Germany. 5Ioffe Institute, St Petersburg, Russia. 6M. V. Lomonosov Moscow State University, P. K. Sternberg Astronomical
Institute, Moscow, Russia. 7Institute of Astronomy, Russian Academy of Sciences, Moscow, Russia. 8Institut für Astronomie und Astrophysik, Tübingen, Germany. 9INAF-Osservatorio
Astronomico di Brera, Merate, Italy. 10Dr. Karl-Remeis-Sternwarte Bamberg and Erlangen Centre for Astroparticle Physics, Universität Erlangen-Nürnberg, Bamberg, Germany.
11
Deceased: M. Pavlinsky. ✉e-mail: predehl@mpe.mpg.de; sunyaev@iki.rssi.ru; churazov@iki.rssi.ru; gilfanov@iki.rssi.ru; am@mpe.mpg.de; knandra@mpe.mpg.de
Fig. 1 | The Spektr-RG–eROSITA all-sky map. An RGB map of the first (with a Gaussian with a full-width at half-maximum (FWHM) of 10′) to generate
Spektr-RG–eROSITA all-sky survey (red for 0.3–0.6 keV, green for 0.6–1.0 keV, this one. Image adapted from ref. 34. Credit: Jeremy Sanders, Hermann Brunner,
blue for 1.0–2.3 keV) is shown in Galactic coordinates, using a Hammer–Aitoff Andrea Merloni and the eSASS team (MPE); Eugene Churazov, Marat Gilfanov
projection. The original image, with a resolution of about 12″, was smoothed (on behalf of IKI).
supernova remnant nearly surrounding us, the possibility that the they would extend roughly 14 kpc above and below the Galactic plane
North Polar Spur is of Galactic scale has been proposed6,20, and is sup- (Extended Data Fig. 1).
ported by several observational arguments7,21. In particular, study of Second, from a preliminary spectral analysis of eROSITA data, the
absorption in X-ray and radio bands places a lower limit of 300 pc on absorbing column density of the diffuse emission in the southwestern
the distance to the structure21, which rules out a nearby supernova bright rim of the eROSITA bubbles (white rectangle in Fig. 2) can be
remnant. In addition, evidence for a large-scale bipolar wind has been constrained to NH = (1.0–3.5) × 1021 cm−2, consistent with what has been
presented, based purely on X-ray and mid-infrared data, even before measured previously21 for the northern structure. One-dimensional
the discovery of the Fermi bubbles22. cross-sections of the observed surface brightness at various latitudes
With the eROSITA data, the full scope and morphology of these gigan- (Fig. 2) are qualitatively consistent with the projection of (quasi-)spher-
tic X-ray structures has become evident. ROSAT, owing to a combination ical thick shells with an outer diameter of 14 kpc. Regardless of the
of its lower sensitivity and softer energy response, could reveal only uncertainties on these numbers, it is clear that the eROSITA bubbles
the brightest part of the southern loop closest to the Galactic plane1,22, are comparable in size to the Galactic disk24.
not the whole structure. More recently, the 0.7–1-keV all-sky map from We note that the extended X-ray emission revealed by eROSITA coin-
the solid-state slit camera (SSC) of MAXI also provided evidence of a cides spatially with the soft component of the GeV emission reported
southern enhancement on these large scales, and a close north–south to surround the Fermi bubbles2,7,25. A possible connection with polar-
symmetry23. ized radio-continuum emission at 2.3 GHz and 23 GHz26 has yet to be
The Fermi bubbles and large-scale X-ray emission revealed by eROS- explored.
ITA show remarkable morphological similarity. We therefore suggest An episodic or continuous energy release in the region of the Galactic
that the Fermi bubbles and the eROSITA structure are physically related, centre is expected to generate a series of distinct structures: shocks and
and refer to the latter as ‘eROSITA bubbles’. Our discovery confirms the contact discontinuities. We see two prominent structures in our maps:
previously suggested common origin of the two objects6,7. The motiva- one is the outer boundary of the eROSITA bubbles; the other separates
tion for a separate name is that, despite the probably common origin, the eROSITA bubbles and the Fermi bubbles. The sharp boundary of
the two structures differ in some important respects. the eROSITA bubbles—which appears bright in X-rays, indicative of
First, we compare their morphologies on the sky (Fig. 3). The Fermi hotter gas at the boundary than outside it—clearly traces the presence
bubbles are roughly elliptical, about 55° × 45° (north–south, east–west) of a non-radiative (or adiabatic) shock (see Methods for an estimate of
in diameter, symmetric about the Galactic centre, with vertical axis per- the gas cooling time). We associate the boundary with a forward shock
pendicular to the Galactic plane, and roughly uniform in γ-ray intensity. linked to the onset of large energy release at the Galactic centre. The
The eROSITA bubbles appear as extended as 80° in longitude, roughly nature of the boundary between the eROSITA and Fermi bubbles is less
80°–85° in latitude and concentrated in annuli or shells. This suggests clear. It could be another forward shock (in the case of a sequence of
that they are, to first-order, close to spherical, with a radius of about energy releases), a reverse shock, a wind-termination shock or a contact
6–7 kpc along the plane, extending radially on the Milky Way close to discontinuity. The reverse or termination shock models for the Fermi
the Sun, so that their northern and southern edges are imprinted by bubbles would imply an additional contact discontinuity somewhere
the closer rim of the bubble. The full vertical extent of the eROSITA between the Fermi and eROSITA bubbles, which is not apparent in the
bubbles is more difficult to determine; assuming a spherical geometry, data. Instead, we consider the simplest scenario in which the eROSITA
30° 30°
20° 20°
10° 10°
23 h
22 h
21 h
19 h
17 h
20 h
18 h
16 h
15 h
0h
3h
2h
1h
14 h
5h
10 h
7h
6h
4h
9h
8h
13 h
11 h
12 h
12 h
–20° –20°
–30° –30°
–40° –40°
–50° –50°
–60° –60°
–70° –70°
–80° –80°
–90°
b
Surface brightness (counts s–1 deg–2)
20
+60° +50° +40°
10
20
Surface brightness (counts s–1 deg–2)
10
Fig. 2 | The soft-X-ray eROSITA bubbles. a, False-colour map of extended (not normalized to the data): a full sphere (yellow), a very thick shell (thickness,
emission detected by eROSITA in the 0.6–1.0-keV range. The contribution of 4 kpc; brown), a thick shell (thickness, 2 kpc; cyan) and a thin shell (thickness,
the point sources has been removed and the scaling adjusted to enhance 0.2 kpc; green). The thick shell (cyan) is the most consistent with the data (see
large-scale structures in the Galaxy. b, One-dimensional surface-brightness Extended Data Fig. 2 for a two-dimensional projection of this model). The
profiles in the same energy band (red lines with pink shading representing region indicated by the white rectangle is where a preliminary spectral analysis
statistical uncertainties), cut at various galactic latitudes (as labelled). For was performed to constrain the line-of-sight absorption column density
comparison, we also show the predictions of four possible geometric models towards the southern eROSITA bubbles.
and Fermi bubbles are causally connected, with the Fermi bubbles driv- outflow, and the boundary of the eROSITA bubbles is the shock that
ing the expansion of the eROSITA bubbles and both structures being propagates through the halo gas. The pressure is thus continuous
associated with the same (gradual or instantaneous) energy release in across the interface between the eROSITA and Fermi bubbles and the
the nuclear region of the Milky Way. In this scenario, the outer bound- total thermal energies of the two features simply reflect their volumes
ary of the Fermi bubbles plausibly represents a contact discontinuity (ignoring the effects of stratification, which may be non-negligible).
that separates the shock-heated interstellar medium from the shocked Given that their characteristic sizes differ by a factor of about 2, the
Fig. 3 | Comparison of the morphology of the γ-ray and X-ray bubbles. the extended gigaelectronvolt emission traditionally referred to as Fermi
A composite Fermi–eROSITA image is shown. The X-ray extended emission bubbles (red; Fermi map adapted from ref. 35), unequivocally establishing their
revealed by eROSITA (0.6–1-keV band; cyan) encloses the hard component of close relation.
total thermal energy of the eROSITA bubbles is almost 10 times larger Way, have hot plasma in their haloes that is highly perturbed by
than that of the Fermi bubbles. activity in their disks, demonstrating the presence of a feedback
The obser ved average X-ray surface brightness of mechanism in apparently quiescent galaxies. Galaxies are thought
(2–4) × 10−15 erg cm−2 s−1 arcmin−2 in the eROSITA bubbles (Methods), to grow via the slow recondensation of the hot halo plasma, which
which decreases with Galactic latitude, is in broad agreement with the was shock-heated during the collapse of the dark-matter halo33.
above scenario. The observed surface brightness, integrated over the The cooling time of the hot plasma in the halo is comparable to
full extent of the eROSITA bubbles, implies a total luminosity of hot the Hubble time, so the process of growing a galaxy is assumed
X-ray-emitting plasma of L ≈ 1 × 1039 erg s−1. to be steady (apart from mergers) and slow. Here we have direct
To inflate the eROSITA bubbles, an average luminosity of the order of evidence of the re-heating of such plasma, to considerable heights
1041 erg s−1 during the past tens of millions of years would be required, above the Galactic disk.
and could arise from either star-forming or AGN activity in the Galactic The detection of these X-ray bubbles was enabled by the combined
centre. As discussed above, the arguments in favour of each interpreta- capabilities of the eROSITA instrument and the Spektr-RG mission
tion in the context of the Fermi bubbles have been debated extensively. profile. More detailed analysis following accurate calibration of the
In the case of the eROSITA bubbles, the energetics are such that they are instrument, substantial increases in data quality from the ongoing sky
at the limit of what the past starburst activity at the centre of the Milky survey and follow-up observations in other parts of the electromagnetic
Way could provide. Alternatively, the eROSITA bubbles could be inflated spectrum will reveal further details of the properties of the eROSITA
by a period (about 1–2 Myr) of Seyfert-like activity (L ≈ 1043 erg s−1) of bubbles and the implications for the structure and evolution of galax-
the central supermassive black hole (Sgr A*). The long cooling time of ies, including the Milky Way.
the hot plasma is consistent with such a hypothesis.
The structures seen here are reminiscent of similar effects seen in
AGN that host rapidly accreting supermassive black holes1. These can Online content
inject a vast amount of mechanical energy into the ambient gas, as Any methods, additional references, Nature Research reporting sum-
revealed by radio-bright bubbles embedded in the X-ray cocoons27. This maries, source data, extended data, supplementary information,
process, known as AGN feedback, is seen in objects ranging from indi- acknowledgements, peer review information; details of author con-
vidual early-type galaxies, such as Centaurus A28, to massive clusters, tributions and competing interests; and statements of data and code
such as A426 (Perseus)29,30, and is thought to have potentially marked availability are available at https://doi.org/10.1038/s41586-020-2979-0.
effects on the evolution of galaxies. On the other hand, explosions of
supernova associated with star formation yield kinetic energy of the 1. Su, M., Slatyer, T. R. & Finkbeiner, D. P. Giant gamma-ray bubbles from Fermi-LAT: active
order of 1051 erg per supernova in the ejecta (also known as stellar feed- galactic nucleus activity or bipolar Galactic wind? Astrophys. J. 724, 1044–1082 (2010).
back), which may drive an outflow from the central region of a galaxy31. 2. Ackermann, M. et al. The spectrum and morphology of the Fermi bubbles. Astrophys. J.
793, 64 (2014).
M82 provides a good example of the latter mechanism32. The energet- 3. Heywood, I. et al. Inflation of 430-parsec bipolar radio bubbles in the Galactic centre by
ics and the most salient features of the observed eROSITA bubbles are an energetic event. Nature 573, 235–237 (2019).
such that neither of the two mechanisms could be excluded a priori. 4. Ponti, G. et al. An X-ray chimney extending hundreds of parsecs above and below the
Galactic centre. Nature 567, 347–350 (2019).
Irrespective of the specific source of energy, our results cor- 5. Egger, R. & Aschenbach, B. Interaction of the Loop I supershell with the local hot bubble.
roborate the notion that inactive disk galaxies, such as the Milky Astron. Astrophys. 294, L25–L28 (1995).
Extended Data Fig. 2 | Soft-X-ray data compared to a thick-shell model for northern bubble is spherical, with an outer radius of 7 kpc and an inner radius of
the eROSITA bubbles. Comparison between the thick-shell model (cyan line in 5 kpc. It is slightly offset from the vertical above the Galactic centre. The
Fig. 2) and eROSITA data (0.6–1.0-keV band) in a Lambert zenithal equal-area southern shell is instead an ellipse, slightly elongated in the north–south
projection. The model is in red; the data are in cyan. The northern bubble is direction (semi-major axis is 7 kpc; semi-minor axis 4.9 kpc).
shown on the left (N); the southern bubble is shown on the right (S). The
Article
Accepted: 20 October 2020 One of the key challenges for nuclear physics today is to understand from first
Published online: 9 December 2020 principles the effective interaction between hadrons with different quark content.
First successes have been achieved using techniques that solve the dynamics of
Open access
quarks and gluons on discrete space-time lattices1,2. Experimentally, the dynamics of
Check for updates the strong interaction have been studied by scattering hadrons off each other. Such
scattering experiments are difficult or impossible for unstable hadrons3–6 and so
high-quality measurements exist only for hadrons containing up and down quarks7.
Here we demonstrate that measuring correlations in the momentum space between
hadron pairs8–12 produced in ultrarelativistic proton–proton collisions at the CERN
Large Hadron Collider (LHC) provides a precise method with which to obtain the
missing information on the interaction dynamics between any pair of unstable
hadrons. Specifically, we discuss the case of the interaction of baryons containing
strange quarks (hyperons). We demonstrate how, using precision measurements of
proton–omega baryon correlations, the effect of the strong interaction for this
hadron–hadron pair can be studied with precision similar to, and compared with,
predictions from lattice calculations13,14. The large number of hyperons identified in
proton–proton collisions at the LHC, together with accurate modelling15 of the small
(approximately one femtometre) inter-particle distance and exact predictions for the
correlation functions, enables a detailed determination of the short-range part of the
nucleon-hyperon interaction.
Baryons are composite objects formed by three valence quarks of the deuteron20 and do not predict physical values for the masses
bound together by means of the strong interaction mediated of light hadrons21.
through the emission and absorption of gluons. Between baryons, Baryons containing strange (s) quarks, exclusively or combined with
the strong interaction leads to a residual force and the most common u and d quarks, are called hyperons (Y) and are denoted by uppercase
example is the effective strong force among nucleons (N)—baryons Greek letters: Λ = uds, Σ0 = uds, Ξ− = dss, Ω− = sss. Experimentally, little
composed of up (u) and down (d) quarks: proton (p) = uud and is known about Y–N and Y–Y interactions, but recently, major steps
neutron (n) = ddu. This force is responsible for the existence of a forward in their understanding have been made using lattice QCD
neutron–proton bound state, the deuteron, and manifests itself in approaches13,14,22. The predictions available for hyperons are character-
scattering experiments7 and through the existence of atomic nuclei. ized by smaller uncertainties because the lattice calculation becomes
So far, our understanding of the nucleon–nucleon strong interaction more stable for quarks with larger mass, such as the s quark. In particu-
relies heavily on effective theories16, where the degrees of freedom lar, robust results are obtained for interactions involving the heaviest
are nucleons. These effective theories are constrained by scattering hyperons, such as Ξ and Ω, and precise measurements of the p–Ξ− and
measurements and are successfully used in the description of p–Ω− interactions are instrumental in validating these calculations.
nuclear properties17,18. From an experimental point of view, the existence of nuclei in which
The fundamental theory of the strong interaction is quantum chromo- a nucleon is replaced by a hyperon (hypernuclei) demonstrates the
dynamics (QCD), in which quarks and gluons are the degrees of free- presence of an attractive strong Λ–N interaction23 and indicates the
dom. One of the current challenges in nuclear physics is to calculate possibility of binding a Ξ− to a nucleus24,25. A direct and more precise
the strong interaction among hadrons starting from first principles. measurement of the Y–N interaction requires scattering experiments,
Perturbative techniques are used to calculate strong-interaction which are particularly challenging to perform because hyperons are
phenomena in high-energy collisions with a level of precision of a short-lived and travel only a few centimetres before decaying. Previ-
few per cent19. For baryon–baryon interactions at low energy such ous experiments with Λ and Σ hyperons on proton targets3–5 delivered
techniques cannot be employed; however, numerical solutions on results that were two orders of magnitude less precise than those for
a finite space-time lattice have been used to calculate scattering nucleons, and such experiments with Ξ (ref. 6) and Ω beams are even
parameters among nucleons and the properties of light nuclei1,2. Such more challenging. The measurement of the Y–N and Y–Y interactions
approaches are still limited: they do not yet reproduce the properties has further important implications for the possible formation of a
*A list of members and their affiliations appears at the end of the paper.
V(r*) (MeV)
0 Repulsive
Attractive
p2
C(k*)
r*
0 0.5 1.0 1.5 2.0 1
p1 r* (fm)
c
Nsame(k*)
C(k*, r*) = ∫ S(r*) \ (k*, r*) 2 d3r* = [(k*)
Nmixed(k*)
Fig. 1 | Schematic representation of the correlation method. a, A collision of wavefunction, ψ(k*, r*). c, The equation of the calculated (second term) and
two protons generates a particle source S(r*) from which a hadron–hadron pair measured (third term) correlation function C(k*), where Nsame(k*) and Nmixed(k*)
with momenta p1 and p2 emerges at a relative distance r* and can undergo a represent the k* distributions of hadron–hadron pairs produced in the same
final-state interaction before being detected. Consequently, the relative and in different collisions, respectively, and ξ(k*) denotes the corrections for
momentum k* is either reduced or increased via an attractive or a repulsive experimental effects. d, Sketch of the resulting shape of C(k*). The value of the
interaction, respectively. b, Example of attractive (green) and repulsive correlation function is proportional to the interaction strength. It is above
(dotted red) interaction potentials, V(r*), between two hadrons, as a function unity for an attractive (green) potential, and between zero and unity for a
of their relative distance. Given a certain potential, a non-relativistic repulsive (dotted red) potential.
Schrödinger equation is used to obtain the corresponding two-particle
Y–N or Y–Y bound state. Although numerous theoretical predictions In this work, we present a precision study of the most exotic among
exist13,26–30, so far no clear evidence for any such bound states has been the proton–hyperon interactions, obtained via the p–Ω− correlation
found, despite many experimental searches31–35. function in p–p collisions at a centre-of-mass energy s = 13 TeV at the
Additionally, a precise knowledge of the Y–N and Y–Y interactions LHC. The comparison of the measured correlation function with
has important consequences for the physics of neutron stars. Indeed, first-principle calculations13 and with a new precision measurement
the structure of the innermost core of neutron stars is still completely of the p–Ξ− correlation in the same collision system provides the first
unknown and hyperons could appear in such environments depending observation of the effect of the strong interaction for the p–Ω− pair.
on the Y–N and Y–Y interactions36. Real progress in this area calls for The implications of the measured correlations for a possible p–Ω−
new experimental methods. bound state are also discussed. These experimental results challenge
Studies of the Y–N interaction via correlations have been pioneered the interpretation of the data in terms of lattice QCD as the precision
by the HADES collaboration37. Recently, the ALICE Collaboration has of the data improves.
demonstrated that p–p and p–Pb collisions at the LHC are best suited Our measurement opens a new chapter for experimental methods
to study the N–N and several Y–N, Y–Y interactions precisely8–12. Indeed, in hadron physics with the potential to pin down the strong interaction
the collision energy and rate available at the LHC opens the phase for all known proton–hyperon pairs.
space for an abundant production of any strange hadron38, and the
capabilities of the ALICE detector for particle identification and the
momentum resolution—with values below 1% for transverse momentum Analysis of the correlation function
pT < 1 GeV/c—facilitate the investigation of correlations in momen- Figure 1 shows a schematic representation of the correlation method
tum space. These correlations reflect the properties of the interaction used in this analysis. The correlation function can be expressed theo-
and hence can be used to test theoretical predictions by solving the retically43,44 as C(k*) = ∫d3r*S(r*) × |ψ(k*, r*)|2, where k* and r* are the
Schrödinger equation for proton–hyperon collisions39. A fundamen- relative momentum and relative distance of the pair of interest. S(r*)
tal advantage of p–p and p–Pb collisions at LHC energies is the fact is the distribution of the distance r* = |r*| at which particles are emitted
that all hadrons originate from very small space-time volumes, with (defining the source size), ψ(k*, r*) represents the wavefunction of the
typical inter-hadron distances of about 1 fm. These small distances relative motion for the pair of interest and k* = |k*| is the reduced rela-
⁎
are linked through the uncertainty principle to a large range of the tive momentum of the pair (k = |p⁎2 − p⁎1 |/2). Given an interaction poten-
relative momentum (up to 200 MeV/c) for the baryon pair and enable tial between two hadrons as a function of their relative distance, a
us to test short-range interactions. Additionally, detailed modelling non-relativistic Schrödinger equation can be used39 to obtain the
of a common source for all produced baryons15 allow us to determine corresponding wavefunction and hence also predict the expected
accurately the source parameters. correlation function. The choice of a non-relativistic Schrödinger
Similar studies were carried out in ultrarelativistic Au–Au colli- equation is motivated by the fact that the typical relative momenta
sions at a centre-of-mass energy of 200 GeV per nucleon pair by the relevant for the strong final-state interaction have a maximal value of
STAR collaboration for Λ–Λ40,41 and p–Ω−42 interactions. This collision 200 MeV/c. Experimentally, this correlation function is computed as
system leads to comparatively large particle emitting sources of C(k*) = ξ(k*)[Nsame(k*)/Nmixed(k*)], where ξ(k*) denotes the corrections
3–5 fm. The resulting relative momentum range is below 40 MeV/c, for experimental effects, Nsame(k*) is the number of pairs with a given
implying reduced sensitivity to interactions at distances shorter k* obtained by combining particles produced in the same collision
than 1 fm. (event), which constitute a sample of correlated pairs, and Nmixed(k*) is
C(k*)
15,000 Coulomb + p − 1 − HAL QCD elastic + inelastic
Λ
2.0
10,000
Ω– 1.5
5,000
1.0
0
1.65 1.66 1.67 1.68 1.69 1.70
b
mΛΚ (GeV/c2) 7
p−Ω −
+
Fig. 2 | Reconstruction of the Ω and Ω̄ signals. Sketch of the weak decay
−
6
of Ω− into a Λ and a Κ−, and measured invariant mass distribution (blue points)
¯ + combinations. The dotted red line represents the fit to the data
of ΛΚ− and ΛK 1.2
5
including signal and background, and the black dotted line the background
C(k*)
alone. The contamination from misidentification is ≤5%.
C( k *)
1.0
4
the number of uncorrelated pairs with the same k*, obtained by com-
3 0.8
bining particles produced in different collisions (the so-called
mixed-event technique). Figure 1d shows how an attractive or repulsive 100 200
2 k* (MeV/c)
interaction is mapped into the correlation function. For an attractive
interaction the magnitude of the correlation function will be above
unity for small values of k*, whereas for a repulsive interaction it will 1
be between zero and unity. In the former case, the presence of a bound
0 100 200 300
state would create a depletion of the correlation function with a depth k* (MeV/c)
increasing with increasing binding energy.
Correlations can occur in nature from quantum mechanical inter- Fig. 3 | Experimental p–Ξ− and p–Ω− correlation functions. a, b, Measured
ference, resonances, conservation laws or final-state interactions. p–Ξ− (a) and p–Ω− (b) correlation functions in high multiplicity p–p collisions at
s = 13 TeV . The experimental data are shown as black symbols. The black
Here, it is the final-state interactions that contribute predominantly
vertical bars and the grey boxes represent the statistical and systematic
at low relative momentum; in this work we focus on the strong and
uncertainties. The square brackets show the bin width and the horizontal black
Coulomb interactions in pairs composed of a proton and either a Ξ− or
lines represent the statistical uncertainty in the determination of the mean k*
a Ω− hyperon.
for each bin. The measurements are compared with theoretical predictions,
Protons do not decay and can hence be directly identified within the shown as coloured bands, that assume either Coulomb or Coulomb + strong
ALICE detector, but Ξ− and Ω− baryons are detected through their weak HAL QCD interactions. For the p–Ω− system the orange band represents the
decays, Ξ− → Λ + π− and Ω− → Λ + Κ−. The identification and momentum prediction considering only the elastic contributions and the blue band
measurement of protons, Ξ−, Ω− and their respective antiparticles are represents the prediction considering both elastic and inelastic contributions.
described in Methods. Figure 2 shows a sketch of the Ω− decay and the The width of the curves including HAL QCD predictions represents the
invariant mass distribution of the ΛΚ− and ΛK ¯ + pairs. The clear peak uncertainty associated with the calculation (see Methods section ‘Corrections
+
corresponding to the rare Ω− and Ω̄ baryons demonstrates the excel- of the correlation function’ for details) and the grey shaded band represents, in
lent identification capability, which is the key ingredient for this meas- addition, the uncertainties associated with the determination of the source
urement. The contamination from misidentification is ≤5%. For the radius. The width of the Coulomb curves represents only the uncertainty
+ associated with the source radius. The considered radius values are 1.02 ± 0.05
Ξ− ( Ξ̄ ) baryon the misidentification amounts to 8%11.
fm for p–Ξ− and 0.95 ± 0.06 fm for p–Ω− pairs, respectively. The inset in b shows
Once the p, Ω− and Ξ− candidates and charge conjugates are selected
an expanded view of the p–Ω− correlation function for C(k*) close to unity. For
and their 3-momenta measured, the correlation functions can be built.
more details see text.
Since we assume that the same interaction governs baryon–baryon
and antibaryon–antibaryon pairs8, we consider in the following the
+
direct sum (⊕) of particles and antiparticles ( p – Ξ − ⊕ p¯ – Ξ¯ ≡ p – Ξ − is attractive and its effect on the correlation function is illustrated
+
− ¯ −
and p – Ω ⊕ p¯ – Ω ≡ p – Ω ). The determination of the correction ξ(k*) by the green curves in both panels of Fig. 3. These curves have been
and the evaluation of the systematic uncertainties are described in obtained by solving the Schrödinger equation for p–Ξ− and p–Ω− pairs
Methods. using the Correlation Analysis Tool using the Schrödinger equation
(CATS) equation solver39, considering only the Coulomb interaction and
assuming that the shape of the source follows a Gaussian distribution
Comparison of the p–Ξ− and p–Ω− interactions with a width equal to 1.02 ± 0.05 fm for the p–Ξ− system and to 0.95 ±
The obtained correlation functions are shown in Fig. 3a, b for the p–Ξ− 0.06 fm for the p–Ω− system, respectively. The source-size values have
and p–Ω− pairs, respectively, along with the statistical and systematic been determined via an independent analysis of p–p correlations15,
uncertainties. The fact that both correlations are well above unity where modifications of the source distribution due to strong decays
implies the presence of an attractive interaction for both systems. For of short-lived resonances are taken into account, and the source size
opposite-charge pairs, as considered here, the Coulomb interaction is determined as a function of the transverse mass mT of the pair, as
10 are visible for the p–Ω− correlation. The theoretical predictions shown
–200
p–Ξ – HAL QCD, I = 0, S = 0 in Fig. 3 also include the effect of the Coulomb interaction.
p–Ξ – HAL QCD Regarding the p–Ξ− interaction, it should be considered that
–300 p–Ω – HAL QCD, I = 1/2, S = 2
strangeness-rearrangement processes can occur, such as pΞ− → ΛΛ, ΣΣ,
C(k*)
5
ΛΣ. This means that the inverse processes (for example, ΛΛ → pΞ−) can
–400 also occur and modify the p–Ξ− correlation function. These contribu-
0 tions are accounted for within lattice calculations by exploiting the well
–500 0 50 100 150 200 known quark symmetries14 and are found to be very small. Moreover,
k* (MeV)
the ALICE collaboration measured the Λ–Λ correlation in p–p and p–Pb
0 1 2
collisions10 and good agreement with the shallow interaction predicted
r (fm)
by the HAL QCD collaboration was found.
Fig. 4 | Potentials for the p–Ξ− and p–Ω− interactions. p–Ξ− (pink) and p–Ω− The resulting prediction for the correlation function, obtained by
(orange) interaction potentials as a function of the pair distance predicted by solving the Schrödinger equation for the single p–Ξ− channel includ-
the HAL QCD collaboration13,14. Only the most attractive component, isospin ing the HAL QCD strong and Coulomb interactions, is shown in Fig. 3a.
I = 0 and spin S = 0, is shown for p–Ξ−. For the p–Ω− interaction the I = 1/2 and spin The first measurement of the p–Ξ− interaction using p–Pb collisions11
S = 2 component is shown. The widths of the curves correspond to the showed a qualitative agreement to lattice QCD predictions. The
uncertainties (see Methods section ‘Corrections of the correlation function’
improved precision of the data in the current analysis of p–p collisions
for details) associated with the calculations. The inset shows the correlation
is also in agreement with calculations that include both the HAL QCD
functions obtained using the HAL QCD strong interaction potentials for: (i) the
and Coulomb interactions.
channel p–Ξ− with isospin I = 0 and spin S = 0, (ii) the channel p–Ξ− including all
allowed spin and isospin combinations (dashed pink), and (iii) the channel
p–Ω− with isospin I = 1/2 and spin S = 2. For details see text.
Detailed study of the p–Ω− correlation
Concerning the p–Ω− interaction, strangeness-rearrangement pro-
described in Methods. The average mT of the p–Ξ− and p–Ω− pairs are cesses can also occur47, such as pΩ− → ΞΛ, ΞΣ. Such processes might
1.9 GeV/c and 2.2 GeV/c, respectively. The difference in size between affect the p–Ω− interaction in a different way depending on the relative
the source of the p–Ξ− and p–Ω− pairs might reflect the contribution orientation of the total spin and angular momentum of the pair. Since
of collective effects such as (an)isotropic flow. The width of the green the proton has Jp = 1/2 and the Ω has JΩ = 3/2 and the orbital angular
curves in Fig. 3 reflects the quoted uncertainty of the measured source momentum L can be neglected for correlation studies that imply low
radius. The correlations obtained, accounting only for the Coulomb relative momentum, the total angular momentum J equals the total
interaction, considerably underestimate the strength of both measured spin S and can take on values of J = 2 or J = 1. The J = 2 state cannot couple
correlations. This implies, in both cases, that an attractive interaction to the strangeness-rearrangement processes discussed above, except
exists and exceeds the strength of the Coulomb interaction. through D-wave processes, which are strongly suppressed. For the
To discuss the comparison of the experimental data with the predic- J = 1 state only two limiting cases can be discussed in the absence of
tions from lattice QCD, it is useful to first focus on the distinct charac- measurements of the pΩ− → ΞΛ, ΞΣ cross-sections.
teristics of the p–Ξ− and p–Ω− interactions. Figure 4 shows the radial The first case assumes that the effect of the inelastic channels is
shapes obtained for the strong-interaction potentials calculated from negligible for both configurations and that the radial behaviour of the
first principles by the HAL QCD (Hadrons to Atomic nuclei from Lat- interaction is driven by elastic processes, following the lattice QCD
tice QCD) collaboration for the p–Ξ− (ref. 14) and the p–Ω− systems13, potential (see Fig. 4), for both the J = 2 and J = 1 channels. This results in
see Methods for details. Only the most attractive (isospin I = 0 and a prediction, shown by the orange curve in Fig. 3b, that is close to the
spin S = 0) of the four components14 of the p–Ξ− interaction and the data in the low k* region. The second limiting case assumes, follow-
isospin I = 1/2 and spin S = 2 component of the p–Ω− interaction are ing a previous prescription47, that the J = 1 configuration is completely
shown. Aside from an attractive component, we see that the interac- dominated by strangeness-rearrangement processes. The obtained
tion contains also a repulsive core starting at very small distances, correlation function is shown by the blue curve Fig. 3b. This curve clearly
below 0.2 fm. For the p–Ω− system no repulsive core is visible and the deviates from the data. Both theoretical calculations also include the
interaction is purely attractive. This very attractive interaction can effect of the Coulomb interaction and they predict the existence of a
accommodate a p–Ω− bound state, with a binding energy of about p–Ω− bound state with a binding energy of 2.5 MeV, which causes a deple-
2.5 MeV, considering the Coulomb and strong forces13. The p–Ξ− and tion in the correlation function in the k* region between 100 and 300
p–Ω− interaction potentials look very similar to each other above a MeV/c, because pairs that form a bound state are lost to the correlation
distance of 1 fm. This behaviour is not observed in phenomenologi- yield. The inset of Fig. 3 shows that in this k* region the data are consist-
cal models that engage the exchange of heavy mesons and predict a ent with unity and do not follow either of the two theoretical predictions.
quicker fall off of the potentials45. At the moment, the lattice QCD predictions underestimate the data,
The inset of Fig. 4 shows the correlation functions obtained using the but additional measurements are necessary to draw a firm conclu-
HAL QCD strong interaction potentials for: (i) the channel p–Ξ− with sion on the existence of the bound state. Measurements of Λ–Ξ− and
isospin I = 0 and spin S = 0, (ii) the channel p–Ξ− including all allowed Σ0–Ξ− correlations will verify experimentally the strength of possible
spin and isospin combinations, and (iii) the channel p–Ω− with isospin non-elastic contributions. Measurements of the p–Ω− correlation func-
I = 1/2 and spin S = 2. The correlation functions are computed using the tion in collision systems with slightly larger size (for example, p–Pb
experimental values for the p–Ξ− and p–Ω− source-size. Despite the fact collisions at the LHC)11 will clarify the possible presence of a deple-
that the strong p–Ω− potential is more attractive than the p–Ξ− I = 0 tion in C(k*). Indeed, the appearance of a depletion in the correlation
and S = 0 potential, the resulting correlation function is lower. This is function depends on the interplay between the average intra-particle
The complex internal structure of molecules can be both useful and a dipole–dipole interactions within each 2D site, while preventing the
hindrance: it represents a key resource for the development of tunable attractive head-to-tail interactions that facilitate losses at short range.
and programmable quantum devices1,9,10, but it is also responsible for Our recent advances in the production of degenerate Fermi gases of
strong inelastic losses during collisions11–14. Despite recent advances in polar molecules7,8, combined with precise electric-field control using
molecular quantum science15–24, full control of elastic collisions between in-vacuum electrodes32 (Fig. 1), allow us to perform a systematic char-
molecules has not been achieved, making it very difficult to create the acterization of the properties of a 2D Fermi gas of polar molecules.
low-entropy bulk molecular gases that are required for the exploration
of rich many-body physics and emergent quantum phenomena1,25.
Here, we report the realization of highly tunable elastic interactions A long-lived 2D Fermi gas of polar molecules
in a quantum gas of polar molecules through the application of an The KRb 2D Fermi gas is created from an ultracold atomic mixture of
external electric field along a stack of two-dimensional (2D) layers fermionic 40K and bosonic 87Rb atoms. The atomic mixture is initially
generated with a one-dimensional optical lattice. The induced electric held in a crossed optical dipole trap (ODT) and then transferred into a
dipole moment in the laboratory frame gives rise to repulsive dipolar single layer of a large-spacing lattice (LSL) with an 8-μm spatial period,
interactions that stabilize the molecular gas against reactive collisions which increases the mixture’s confinement along the vertical direction
and formation of collisional complexes. These long-range interactions (y). The mixture is then transferred into a vertical lattice (VL) with spac-
provide a large elastic collision cross-section for identical ultracold ing of 540 nm that confines it to a quasi-2D geometry. The intermediate
fermionic molecules, in contrast to contact interactions26. We dem- LSL transfer results in the Rb cloud populating a controllable number
onstrate the enhancement of dipolar interactions by several orders of VL layers τ ranging between 5 and 15. We directly probe the number
of magnitude and achieve a ratio of elastic-to-inelastic collisions that of occupied 2D layers via a matter-wave focusing technique on the
exceeds 100. This favourable interaction regime enables direct molecu- Rb cloud (Fig. 1c)33,34. The measured τ is in excellent agreement with
lar thermalization and efficient evaporative cooling, allowing us to theoretical modelling of the in situ cloud size (see Methods).
bring the molecular temperature T below the Fermi temperature TF. Magneto-optical association is used to pair roughly half of the initial
The onset of quantum degeneracy is signalled by deviations from the Rb atoms into ground-state KRb molecules35. This process is fast and
classical expansion energy as the ratio T/TF is reduced below unity7,27. coherent, and the resulting molecular cloud populates the same lay-
Our strategy follows previous theory proposals28–30 and our earlier ers originally occupied by the Rb cloud. The leftover K and Rb atoms
experimental study on molecular reactions in quasi-two dimensions31. are selectively and quickly removed from the trap. In the VL, the trap
This geometry allows us to take advantage of the anisotropic charac- frequencies are set to (ωx, ωy, ωz) = 2π × (40, 17,000, 40) Hz. The quoted
ter of the dipolar potential and retain only the repulsive side-to-side trapping frequencies are for KRb throughout the paper unless otherwise
JILA, National Institute of Standards and Technology, Boulder, CO, USA. 2Department of Physics, University of Colorado, Boulder, CO, USA. ✉e-mail: giva4289@colorado.edu; ye@jila.colorado.edu
1
n (×107 cm–2)
4
0
0 2 4 6 8 10
Time (s)
g
y b EDC (kV cm–1)
x
0.0 2.1 4.7 9.1
z 12
b c 9
+JV 6
EDC
3
–JV
y –V 0
0.0 0.1 0.2 0.3
x
z Dipole moment (D)
Fig. 1 | Experimental setup. a, The 2D molecular cloud is trapped at the centre c 300 4.5
of the electrode assembly (grey). 2D optical trapping is achieved with the VL 4.0
(green), which is loaded using the ODT (orange) and LSL (red). Absorption
3.5
images of molecules are collected through the same lens as that used to
focus the LSL. b, Sketch of the experiment as seen down the z axis. The bias
electric field is generated along y, perpendicular to the 2D layers of the VL.
0.5
c, Matter-wave focusing data of the Rb layers in the VL, which have a spacing of
540 nm. 100
0 0.0
stated. We create a 2D gas with N ≈ 20,000 trapped molecules, a typical
temperature T ≈ 250 nK, and T/TF ranging from 1.5 to 3 depending on τ. 7 10 13 16
The 2D molecular cloud is at the centre of an in-vacuum six-electrode Zy /(2π) (kHz)
assembly composed of two indium tin oxide (ITO)-coated glass plates
Fig. 2 | Long-lived polar molecules in 2D. a, Time evolution of the molecular
and four tungsten rods (Fig. 1a). With this, we generate a highly tunable
density n at d = 0.2 D. b, Inelastic loss rate β as a function of dipole moment. All
bias electric field EDC that induces strong dipolar interactions between error bars are 1 standard error of the mean (s.e.), determined from two-body
molecules (Fig. 1b). The ratio γ of the voltage of the rods to the voltage decay fits (equation (1)) The top x axis shows the bias electric field EDC at the
of the plates can be used to cancel the curvature introduced by the corresponding dipole moment . c, Both β (grey circles) and the heating rate
parallel plate edges (flat-field configuration) or to introduce additional (orange squares) saturate at their minimum values near ω y = 2π × 7 kHz (vertical
curvatures and gradients for molecule manipulation. grey bar indicates uncertainty in the molecule temperature), consistent with
The chemically reactive KRb molecules suffer from inelastic the mechanism of quasi-2D dipolar scattering. Heating rate error bars are 1 s.e.,
two-body losses2,12, which result in the average molecular density n determined from linear fits.
decaying over time t according to a two-body rate equation of the form:
Γth (s–1)
Tx (μK)
a b c
10
Trap depth (μK)
4
0 0
3
N (×103)
–2
Sevap
–2
2
–200 0 200 –200 0 200
Position (μm) Position (μm)
Optical trap 1
Electric field 1
Combined 0.05 0.10 0.15 0.20 0.0 0.1 0.2 0.3
T (μK) Dipole moment (D)
d 0.3
e f
4 0.4
Integrated OD
3 0.3
0.2
δU/U
2 0.2
y 1 0.1
0.1
0 0.0
x
0.0 –100 –50 0 50 100 0.0 0.5 1.0 1.5 2.0
OD Position (μm) T/TF
Fig. 4 | Evaporative cooling to the quantum degenerate regime. a, Cuts (bottom). e, Optical density (OD, dimensionless) profiles (orange circles for
along the x axis of the combined electro-optical potential for the flat-field T/TF = 2.0(1) and grey diamonds for T/TF = 0.81(15)) of the images in d
configuration (left panel) and at the end of the evaporation (right panel). (integrated along y), together with the Fermi–Dirac fit to the whole cloud (grey
b, Evolution of N and T (orange squares) at different stages of the evaporation line) and the Gaussian fit to the outer wings (orange line). f, Measurement of
trajectory at EDC = 6.5 kV cm−1 and d = 0.25 D. The power-law fit (orange line) δU/U at different values of T/TF from the Fermi–Dirac fit to the entire cloud
yields Sevap = 1.06(15). The dashed grey line is for a constant T/TF, corresponding (grey circles) and from the Gaussian fit to the outer wings of the cloud (orange
to Sevap = 2.0. Error bars are 1 s.e. of four independent measurements. c, Summary squares). The solid and dashed curves show δU/U for the 2D and 3D ideal Fermi
of Sevap versus d. All error bars are 1 s.e., determined from power-law fits. gases, respectively. All error bars are 1 s.e., determined from Gaussian or
d, Average of 20 band-mapped absorption images of the molecular cloud in the polylogarithmic fits.
x–y plane after 5.84 ms of time of flight for T/TF = 2.0(1) (top) and T/TF = 0.81(15)
mω 2x σ 2
Trel = . (7) Data availability
kB(1 + ω 2x t 2)
The data that support the findings of this study are available from the
The release temperature Trel is proportional to the energy density corresponding author upon reasonable request. Source data are pro-
U of the Fermi gas, U = 2kBTrel, which saturates to a non-zero value as vided with this paper.
T → 0. In contrast, the energy density Ucl = 2kBT of a classical gas
approaches zero as T → 0. 53. Dalfovo, F., Giorgini, S., Pitaevskii, L. P. & Stringari, S. Theory of Bose–Einstein
When the Gaussian fit is constrained to only the outer wings (that is, condensation in trapped gases. Rev. Mod. Phys. 71, 463–512 (1999).
54. Inguscio, M., Ketterle, W. & Salomon, C. Ultra-cold Fermi gases. In Proceedings of the
the high-momentum states) of the cloud, we can extract a new width
International School of Physics ‘Enrico Fermi’ (2007).
σout from which, using equation (7), we obtain a corrected temperature
Tout through the relation:
Acknowledgements We acknowledge funding from NIST, DARPA DRINQS, ARO MURI and NSF
Phys-1734006. We thank J. L. Bohn, A. M. Kaufman, and C. Miller for careful reading of the
mω 2x σ out
2
manuscript and T. Brown for technical assistance.
Tout = . (8)
kB(1 + ω 2x t 2)
Author contributions All authors contributed to carrying out the experiments, interpreting the
As the excluded region from the centre of the Gaussian fit is results, and writing the manuscript.
Extended Data Fig. 1 | Layer occupancy. Histogram of the average number per
layer (relative population) for the data shown in Fig. 1c.
Extended Data Fig. 2 | Trend of ω x /(2π) versus γ. Grey points are the
experimental measurements at EDC = 5 kV cm−1, the solid grey line is a linear fit to
guide the eye, and the dashed line is the prediction (Sim) from the
finite-element model. All error bars are 1 standard deviation of the mean.
Article
Extended Data Fig. 3 | Evaporation sequence. a, Ramp in EDC. b, Ramp in γ. temperature at each time point. e, Evolution of T/TF during the ramp. All error
c, Trap depth versus time from the finite-element model of electro-optical bars are 1 standard error of the mean.
potential. d, Evolution of η, calculated by taking the ratio of the trap depth and
Extended Data Fig. 4 | Fermi gas thermometry. Trend of Tout/Trel as a function
of the excluded region from the centre of the Gaussian fit for T/TF = 0.81(15)
(orange diamonds) and T/TF = 2.0(1) (black circles). Solid lines are Gaussian fits
to simulated density profiles for T/TF = 2.0 (black) and T/TF = 0.8 (orange). All
error bars are 1 standard error of the mean.
Article
https://doi.org/10.1038/s41586-020-2981-6 William Loh1,3 ✉, Jules Stuart1,2,3, David Reens1, Colin D. Bruzewicz1, Danielle Braje1,
John Chiaverini1, Paul W. Juodawlkis1, Jeremy M. Sage1,2 & Robert McConnell1
Received: 16 January 2020
The ability to precisely measure time with a portable system has long unrivalled narrow linewidths of <1 Hz under high-vacuum operation,
been essential to navigation. The necessity for accurate and portable but otherwise are unwieldy and prone to vibration4,7,16,17.
timekeeping inspired the development of Harrison’s marine chronom- A promising portable alternative to these BCS lasers has recently
eter nearly 300 years ago and continues to this day, reflected in modern emerged via generation of stimulated Brillouin scattering (SBS) light
societal reliance on the global positioning system (GPS). Recently, in an ultrahigh quality factor (Q) resonator18–27. In comparison to the
optical atomic clocks, operating at frequencies of hundreds of tera- BCS laser, the SBS laser offers the advantages of reduced cavity volume,
hertz, have achieved performance far surpassing that of the best micro- operation without vacuum, the ability to be rigidly mounted to any flat
wave clocks and have substantially advanced the precision with which surface, and a potentially higher tolerance to vibration. Despite these
time—and, equivalently, distance—can be measured. However, portable properties, the application of the SBS laser to state-of-the-art atomic
implementations of these optical clocks will require substantial modi- physics has yet to be demonstrated, primarily owing to the laser’s sub-
fications to the existing clock architecture, including miniaturization stantial frequency drift in response to temperature change21. Here, we
of the clock laser whose performance is central to the operation of the overcome the challenges of drift by applying a recently developed
clock. A key challenge is to maintain the frequency stability of the clock technique28 to sense and control the SBS laser’s long-term frequency
laser while reducing its size. This stability is required for the laser to fluctuations in the regime of a few hundred hertz. We utilize the stabi-
(1) remain within the vicinity of the atom’s narrow-linewidth transi- lized SBS light in the demonstration of a strontium-ion optical clock,
tion during the period of time before the clock feedback is engaged which breaks the long-standing clock paradigm of requiring a
(minutes) and (2) remain locked to the atomic transition between feed- BCS laser to serve as the master oscillator. Through a clock self-
back cycles (milliseconds). These stringent requirements have thus comparison measurement, we achieve a fractional frequency stability
far eliminated all possible candidates for clock interrogation apart of 3.9 × 10−14 / τ (with the interrogation time τ in s), beyond the
from bulk-cavity-stabilized (BCS) lasers13–15, which exhibit currently short-term stability achievable by the best microwave clocks29 and
Lincoln Laboratory, Massachusetts Institute of Technology, Lexington, MA, USA. 2Department of Physics, Massachusetts Institute of Technology, Cambridge, MA, USA. 3These authors
1
PD 3
PID
PD 1 g
EOM
SBS 1
SOA
SO SBS 1 PBS
LO PID
PD 4 h Feedforward Frequency
PD 2 SBS out AOM SOA doubler To Sr ion
(1,348.1 nm)
SBS 1 To measurement
PID
Rot. 200 MHz
Servo 180 MHz 30 MHz
PLL
PD 5 Servo out LO
SBS 2 PID 30 MHz
VCO
LO
LPF
Fig. 1 | SBS laser setup and stabilization scheme. a–e, Illustration of the steps stabilization is applied to the SBS laser and will be continued further in h.
(a–e) required for SBS stabilization. Two orthogonal polarization SBS signals g, Photograph of the SBS resonator resting within a copper enclosure and
are generated whose beat note is applied to an acousto-optic modulator (AOM) placed next to a quarter dollar. The SBS resonator including the enclosure and
to compensate for the SBS laser’s frequency drift. See text for details. lid occupies a total space of 3.1 inch × 3.5 inch × 1 inch. h, Diagram of the SBS
f, Diagram of the SBS laser setup. The AOM, electro-optic modulator (EOM), stabilization circuitry. The beat note of two SBS lasers is first converted to a
semiconductor optical amplifier (SOA), polarization rotator (Rot.), polarization voltage signal through a voltage-controlled oscillator (VCO) operating in a
beam splitter (PBS), photodetector (PD), local oscillator (LO) and proportional- phase-locked loop (PLL). Afterwards, a linear ramp subtraction, low-pass filter
integral-derivative (PID) controller enable the independent locking of a pump (LPF) and factor of 39 multiplication is applied before this signal is used for
laser to two orthogonal polarization modes of the resonator (Res.). Two correcting the SBS laser drift. The dashed box indicates the elements that
counter-propagating SBS lasers (SBS 1 and SBS 2) are created from the pump comprise the PLL.
lasers. The blue shaded area designates the region where feedforward
close to an order of magnitude away from state-of-the-art laboratory This correction brings the SBS laser’s stability to a regime where it can
single-ion clocks3. be used to interrogate a 88Sr+ ion optical clock (Fig. 1e).
For our clock demonstration, we interrogate a 88Sr+ ion using a The SBS laser (Fig. 1f) consists of a single pump laser whose output at
fibre-cavity SBS laser with radiation at 1,348 nm that is frequency dou- 1,348 nm is split into two separate paths (see Methods section ‘SBS laser
bled to reach the narrow-linewidth 88Sr+ clock transition at 674 nm setup’). On one path, an acousto-optic modulator (AOM) is used both
(ref. 30). Figure 1a–e depicts our overall strategy for achieving a stable SBS to shift the light by approximately 180 MHz and to enable independent
optical atomic clock. Starting from two orthogonally polarized pump control of its frequency. Afterwards, the polarization of one of the pump
lasers input into a high-Q optical-fibre resonator (Fig. 1a), the resona- beams is rotated by 90°, and the two pump beams are then sent into the
tor generates two orthogonally polarized SBS outputs that are Stokes SBS resonator in opposite directions to accomplish Pound–Drever–Hall
shifted from their respective pumps by about 12.5 GHz (Fig. 1b). The two (PDH) locking35 to the resonator’s two orthogonal polarization modes.
SBS lasers interfere on a photodetector to produce a 180-MHz beat note The SBS resonator itself exhibits a loaded Q of 1.9 × 108 and consists
(Fig. 1c), the frequency deviation of which predominantly corresponds of 2 m of optical fibre wound around a 2-inch-diameter mandrel that
to the temperature drift of the SBS resonator28. This method of tem- rests within a 3.1 inch × 3.5 inch × 1 inch temperature-controlled cop-
perature sensing is based on techniques that connect the differential per enclosure (Fig. 1g). The two pump lasers each generate their own
temperature sensitivity between two orthogonal polarization resonator counter-propagating SBS beams, which upon leaving the resonator are
modes to a measurement of temperature change31–33. However, owing to both coupled out of the system through a pair of circulators. A portion
issues associated with both the detection and control of small shifts in of the SBS light is also monitored and used to stabilize the amplitude
temperature, the best prior stabilization efforts have so far been limited of the SBS laser. Past the circulator, the outputs of both orthogonal
to a range of about 10 μK (ref. 34). Here, as a consequence of the excep- polarization SBS lasers are interfered on a photodetector, which serves
tionally narrow SBS lasing linewidth, the resolution of our temperature as our method for sensing temperature change.
sensor reaches below 100 nK. In order to circumvent the need for direct Figure 1h presents the feedforward circuitry used in the stabilization
control of temperature, we apply the dual-polarization beat note (Pol. of the SBS laser (see Methods section ‘Feedforward implementation’).
beat) as a feedforward correction to the SBS laser’s frequency (Fig. 1d). After the 180-MHz SBS temperature error signal is generated, a series
×39
SBS 2 Pol. beat ×39
Res. PD 5
b 2.0 c 20
Free-running SBS Free-running SBS
1.5 Pol. beat ×39 Pol. beat ×39
Frequency drift (MHz)
10
–10
0
–0.5 –20
0 4 8 12 0 4 8 12
Time (min) Time (min)
d 20 e 10–8 10–3
Free-runniing SBS
Free-running
F
10–9 Drift-subtracted
D
Drift-subtr
rracted SBS 10–4
10
Frequency drift (kHz)
ΔT (K)
0 10–11
Δf/f
10–6
220 Hz
10–12 10–7
–10
10–13 22 Hz 10–8
τ
Ideal 1/√
–20 10–14
0 4 8 12 10–3 10–2 10–1 100 101 102
Time (min) Time (s)
Fig. 2 | SBS laser drift cancellation procedure. a, Simplified diagram through the linear ramp and temperature correction procedure. e, Fractional
illustrating the correction of the SBS frequency drift via subtraction of the frequency (Δf/f ) noise measurements and corresponding extracted
free-running SBS laser and a frequency multiplied (×39) polarization beat temperature deviation (ΔT) of the SBS laser. The green shaded section
signal (Pol. beat ×39). b, Experimental time series of the free-running SBS laser indicates the region where the SBS laser noise is gradually averaging down.
(red) and the polarization beat-note (blue) drift. A numerical subtraction of the We lock the SBS laser to the ion at these timescales for the demonstration
two (black) produces a residual linear drift. c, Numerical removal of the linear of the optical atomic clock. The blue section indicates where the SBS laser
drift from the free-running SBS and polarization beat-note time series experiences drift that can be mostly compensated for via subtraction. The
revealing the underlying temperature-induced frequency drift. The excellent red shaded section indicates where the SBS laser drift dominates. After
agreement between the two demonstrates that the correction signal subtraction, the resulting noise of the SBS laser reveals a short-term frequency
accurately tracks the SBS temperature drift. d, Subtraction of the SBS and variation of 22 Hz and a long-term frequency drift of 220 Hz, which can be
polarization beat-note signals in Fig. 2c. The drift is numerically cancelled further reduced by locking to the ion.
of operations is applied to prepare this signal to serve as a correction plate. The beat note between two orthogonal polarization SBS lasers,
to the SBS laser. Using a phase-locked loop (PLL), the temperature which is used as a sensor for temperature change, is subtracted from
error is first converted to a voltage that is low-pass filtered and ampli- one free-running SBS laser to stabilize its frequency drift. To dem-
fied before it is converted back again to a radio-frequency (RF) signal. onstrate this procedure, both the frequency of the free-running SBS
A net frequency multiplication factor of 39 is achieved through this laser’s beat note with a reference BCS laser and the frequency of the
process, which enables the temperature error to precisely match and dual-polarization beat note (see Methods section ‘Measurement of
cancel the SBS frequency drift. The subtraction of a ramp signal also the SBS laser noise’) are experimentally measured at time intervals of
enables the ability to compensate for a residual linear SBS drift. After 1 ms and plotted for the wavelength of 1,348 nm over a period of 12 min
correction, the SBS laser output is frequency doubled to reach 674 nm (Fig. 2b). The polarization beat is also numerically multiplied by the
for interrogation of the 88Sr+ ion. Together, Fig. 1f–h combined comprise empirically determined correction factor of 39 in order to account
the SBS laser subsystem whose output is directed onto a 88Sr+ ion for for the common-mode suppression of the extracted temperature
operation of an optical atomic clock. shifts. The resulting multiplied signal serves as the feedforward cor-
Figure 2a demonstrates the feedforward procedure for correcting the rection to temperature drift, which on numerical subtraction from the
SBS laser’s drift. In comparison to the feedback approach of ref. 28, the free-running SBS laser yields a residual linear drift of 160 kHz min−1.
use of feedforward enables direct control of the SBS laser’s frequency This linear drift results from the individual linear drifts of the SBS
with advantages in avoiding the slow servo response of controlling and polarization beat signals, which are in opposing directions for
temperature at the fibre core and also in circumventing unintentional the case of Fig. 2b, and represents a parasitic frequency shift that our
length shifts that arise from thermal expansion in the underlying copper temperature sensing technique cannot account for. We attribute the
420
stabilization. The measurement interval
300 400 was 1 ms. b, Zoomed-in plot of the SBS
frequency drift. The average long-term
380
200 frequency deviation is 310 Hz. c, Measured
360 lineshape of the stabilized SBS laser
7 9 11 13 15 (red solid line) and corresponding Voigt
100 Time (min)
Free fit with R 2 = 0.93 (dashed line). The
Stabilized
running measured linewidth (Δf ) is 50 Hz The
0 spectrum is taken with a sweep time of
0 5 10 15
Time (min) 225 ms and a resolution bandwidth of
20 Hz. d, Fractional frequency(Δf/f )
c 10 d 10–11 noise comparison between the stabilized
SBS laser (red line) and the ideal
0 drift-cancelled SBS laser of Fig. 2e. The
Δf = 50 Hz 170 nK
Normalized power (dB)
–30 10–13 48 Hz
observed linear drift to a slow relaxation of the SBS resonator over the feedforward correction is applied, the SBS frequency drift flattens
time36 (see Methods section ‘SBS laser residual linear drift’), which is to a value near zero, as indicated by the horizontal dashed line guide of
projected to equilibrate on a timescale of months. Rather than per- Fig. 3a. A zoomed-in trace of the stabilized SBS frequency drift (Fig. 3b)
forming this subtraction step first, we instead numerically remove the shows close agreement to Fig. 2d (accounting for the frequency dou-
linear drift from the SBS and polarization-beat traces, which reveals bling), and indicates that our experimental implementation of the
the underlying temperature-induced drift of the SBS laser frequency feedforward stabilization accurately compensates for both linear and
(Fig. 2c). The correction signal, which now occupies a span of 30 kHz temperature-induced SBS laser drift.
at the wavelength of 1,348 nm, shows excellent agreement with the The lineshape of the SBS laser subsystem (Fig. 3c), measured at
remaining SBS laser drift for cancellation. With the linear drift removed, 674 nm via beating with a BCS laser, demonstrates further the laser’s
a subtraction of the polarization-beat frequency from the SBS laser exceptional short-term noise. The spectrum exhibits a linewidth of
frequency yields a stabilized SBS laser frequency with about 220-Hz 50 Hz. This value of linewidth is confirmed by the measured fractional
frequency fluctuations (Fig. 2d). frequency noise of the stabilized SBS laser (Fig. 3d), which reaches a
Figure 2e shows the fractional frequency noise, that is, the minimum of 1.1 × 10−13 at 60 ms and corresponds to a frequency devia-
root-mean-square (r.m.s.) frequency shift of the SBS laser normalized tion of 48 Hz at 674 nm. At long timescales, the noise level becomes
to the laser’s centre frequency, derived from the time series traces of 1.4 × 10−12 (620 Hz), which is slightly larger than the frequency excur-
Fig. 2b–d. The free-running 1,348-nm SBS laser reaches a minimum sions found with the numerical drift-cancelling procedure of Fig. 2e.
noise level of 1.4 × 10−13 (corresponding to a linewidth of 30 Hz) at We attribute this difference in drift to noise in the electronics used
10 ms but becomes unbounded at longer timescales and increases to for feedforward stabilization. At the level of 620 Hz, our achieved fre-
5.8 × 10−10 at 100 s. When the measured drift is subtracted to account for quency drift corresponds to temperature stabilization of the SBS laser
both the linear and temperature-induced drift, the SBS laser maintains subsystem at a level below 170 nK.
its performance of 1.0 × 10−13 (22 Hz) at short timescales and experi- To demonstrate experimentally the practical capability of the SBS
ences a >500-fold reduction in drift over the long term. The SBS laser laser, we use the subsystem to run an atomic clock, stabilizing the laser
frequency excursions become bounded at the value of 1.0 × 10−12 or to the narrow-linewidth S1/2 ↔ D5/2 quadrupole clock transition in 88Sr+
220 Hz, which corresponds to temperature stabilization of the SBS (0.4 Hz natural linewidth). We interrogate a single strontium ion con-
laser at a level below 120 nK. fined 50 μm above the surface of a microfabricated surface-electrode
We experimentally implement the numerical feedforward procedure trap within a cryogenic ultrahigh-vacuum apparatus37. As shown in
of Fig. 2 and apply the corrections onto the SBS laser’s frequency via Fig. 4a, the clock interrogation light is amplified through a series of
an AOM (Fig. 3a). The linear drift is accounted for by a ramp generator, injection-locked lasers followed by tapered amplifiers, with fibre-noise
and the factor of 39 multiplication is implemented through a PLL. The cancellation stages to mitigate phase noise picked up as the clock laser
stabilized output is frequency doubled to 674 nm and is measured is routed between and across rooms within optical fibre; a single flipper
against a reference BCS laser for characterization of the SBS laser drift. mirror allows us to select either an existing BCS laser (see Methods sec-
For the remainder of this Article, we will refer to the frequency-doubled tion ‘Characteristics of the BCS laser’) or the SBS laser subsystem as the
and amplified SBS laser as the ‘SBS laser subsystem’, whose output at initial seed for the injection stages. To maximize the coherence time of
674 nm is used to interrogate the clock ion. The SBS laser subsystem, the ion’s optical transition, we employ a system of passive magnetic field
operating at 674 nm, starts initially in a free-running state for the first stabilization using persistent superconducting currents38 and active
6.5 min of measurement and drifts by 400 kHz within this time. Once laser-vibration compensation using an interferometric scheme similar
AOM ×2
0.5
SBS subsystem Injected laser
Tapered
amplifier
0.3
AOM 0.1
Cryogenic 0 5 10
vacuum chamber Probe phase (rad)
c d 10–12
1.0
Ground state probability
10–13
Δf/f
Δf = 370 Hz 10–14
0.8
10–15
0.7 10–16
–2 –1 0 1 2 10–1 100 101 102 103
Detuning (kHz) Time (s)
Fig. 4 | SBS laser optical clock. a, Schematic of laser beam path to the clock interleaved with frequency corrections to keep the SBS laser subsystem locked
chamber. A flipper mirror allows the SBS laser subsystem to be interchanged to the atomic transition. The ground state probability is measured as a function
with a BCS laser as the master frequency source. The injected laser and tapered of frequency detuning relative to the clock transition. The vertical blue error
amplifier represent a series of two injection locking and amplification stages, bars represent 1σ error derived from photon counting statistics. d, Fractional
respectively. An AOM is used to finely adjust the laser frequency before the frequency (Δf/f ) noise of the difference frequency between interleaved clocks.
beam is focused onto an ion inside the cryogenic vacuum chamber used as the The measured fractional frequency noise is divided by 2 to estimate the error
clock chamber. b, Measured Ramsey fringes on the clock transition. Two π/2 of a single clock, assuming even distribution of error in the correction signals.
pulses are applied around a τ = 1 ms interrogation period. The ground state The blue points represent the frequency noise at a selection of averaging times,
probability is measured as a function of the phase of the probe pulse. and the vertical blue bars indicate the 1σ error in this calculation46. From a fit to
c, Spectroscopy of the |5S1/2, m J = −1/2⟩ → |4D5/2, m J = −3/2⟩ transition in 88Sr+ these data (dashed red line) assuming a purely white noise spectrum, we obtain
measured with the SBS laser subsystem. Spectroscopic measurements are the function 3.9 × 10−14/ τ .
to fibre noise cancellation39. With these techniques, we perform a series during the interrogation time result in an additional phase shift and
of Ramsey experiments and deduce a coherence time of 2.9 ms with the are thus mapped to the ion’s state distribution. Between interrogation
BCS laser, which we take as an upper bound for the performance of the cycles, the state of the ion must be detected and then re-initialized
clock experiment. Assuming that frequency fluctuations in the laser into the lower clock level; this leads to a 1.85-ms dead time, during
are the dominant decoherence mechanism, we infer an effective laser which the system is insensitive to frequency fluctuations of the laser
linewidth of 1/(2π × 2.9 ms) = 55 Hz, which is close to the SBS laser sub- (see Methods section ‘Clock protocol and simulation’ for more details
systems’s linewidth in Fig. 3c. Thus we conclude that uncompensated of the lock procedure). As a demonstration of the efficacy of the atomic
noise in our experiment (for example, from optics subject to acoustic lock, we perform Rabi spectroscopy of the clock transition using the
vibrations or magnetic field instability at the ion) contributes at about SBS laser subsystem, interleaved with frequency corrections follow-
the same level as noise from the SBS laser subsystem (see Methods ing the protocol above (Fig. 4c). These spectroscopic measurements
section ‘Limits of the optical clock measurement’ for discussion of are performed with pulse time of 10 ms (to avoid Fourier broadening
noise sources). All together, these noise sources limit the clock inter- of the line), which leads to a longer clock dead time of about 11 ms for
rogation time we achieve, and therefore set the ultimate limit in clock these measurements. The width of the measured feature, 370 Hz, is
performance. in close correspondence with the simulated value of 250 Hz, derived
An AOM is used both for scanning the SBS laser subsystem over the from numerical application of the clock protocol to the free-running
clock transition and for locking the laser’s frequency to the atomic reso- SBS laser subsystem noise measured in Fig. 3d, given the effective dead
nance. To discipline the SBS laser subsystem to the atomic resonance time of 11 ms. This provides further evidence that noise mechanisms
frequency, we create an error signal via a Ramsey experiment consisting downstream from the laser source do not substantially affect measure-
of two π/2 pulses on the clock transition, separated by an interroga- ments with the SBS laser subsystem.
tion time τ = 1 ms. As the phase of the second π/2 pulse is varied, the In order to assess the stability of the clock when running with the SBS
population in the lower clock state traces out a sine curve (Fig. 4b). laser subsystem, we perform a self-comparison measurement via two
The slope of this signal reaches a maximum when the second pulse is independently operated clock signals generated by interleaving dis-
90° out of phase with the first pulse. Variations in the laser frequency tinct sets of correction signals applied to the laser (see Methods section
Author contributions W.L., J.S. and R.M. conceived, designed and carried out the experiments
Clock protocol and simulation with the SBS laser. W.L., J.S., D.R. and R.M. conceived, designed and carried out the experiments
Our clock protocol consists of a Ramsey measurement, with the initial with the clock protocol. All authors discussed the results and contributed to the manuscript.
π/2 pulse driven by the SBS laser, an interrogation time of 1 ms, and a
Competing interests The authors declare no competing interests.
final π/2 pulse 90° out of phase with the first pulse. Both pulses are
amplitude corrected using a composite pulse sequence50 to mitigate Additional information
amplitude fluctuations. After completing this Ramsey sequence, we Correspondence and requests for materials should be addressed to W.L.
Peer review information Nature thanks Clément Lacroûte and the other, anonymous,
measure the ion state via state-dependent scattering of photons on reviewer(s) for their contribution to the peer review of this work.
the cycling S1/2 → P1/2 ion transition. After determining which of the two Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | SBS laser noise. Shown is the measurement of the
674-nm SBS laser subsystem’s frequency noise before and after the application
of feedforward stabilization. The SBS laser subsystem features a white noise
floor of 3 Hz 2 Hz−1 before feedforward and a gradual increase in noise at lower
offset frequencies. An integration over the noise spectral density yields a
full-width at half-maximum linewidth of 45 Hz, while a calculation of the
white-noise limited linewidth at large offset frequencies yields 9 Hz. With
feedforward turned on, the SBS laser subsystem’s noise increases slightly and
exhibits additional noise peaks that arise from the RF signal generator used for
feedforward correction. The integrated linewidth increases to 51 Hz.
Extended Data Fig. 2 | Optimization of feedforward stabilization. feedforward correction ratio of 40:1. b, Time series trace of the 1,348-nm
a, Measurement of the differential temperature sensitivity of the SBS SBS laser’s frequency with the linear drift removed and the SBS amplitude
resonator’s two orthogonal-polarization modes. The linewidths of the modes unservoed. The lack of correspondence between the free-running SBS
are measured to be 1.2 MHz (blue arrows). For an applied temperature shift of (red trace) and the polarization beat note (‘Pol. beat ×39’; blue trace) indicates
ΔT = −0.25 °C, the centre frequency at 1,348 nm changes by 490 MHz, while the the inability to cancel frequency drift when amplitude noise is present.
mode separation (black arrows) changes by 11.9 MHz. This corresponds to a
Article
Extended Data Fig. 3 | Linear drift decay of SBS resonator. Record of the
SBS laser’s linear drift at 1,348 nm for two resonators. The first resonator
(red circles) is tracked over 79 days, and its linear drift decreases to a value
of 200 Hz s−1 at the end of the elapsed period of time. The second resonator
(blue squares) is tracked over 260 days and reaches a minimum of 30 Hz s−1.
Extended Data Fig. 4 | Determining and tracking linear drift. a, Rabi transition and sidebands after applying a linear drift correction to null out the
spectroscopy of the |5S1/2, m J = −1/2⟩ → |4D5/2, m J = −3/2⟩ clock transition and the natural drift of the resonator’s frequency. Over the course of 20 min of
first-order motional sidebands at νclock ± ν trap (blue and red sideband, measurements, very little deviation in the centre frequency is observed.
respectively) is taken at regular intervals of approximately 25 s. After c, Linear drift determined from the data presented in b and c. The linear drift
performing fits to the symmetric sidebands, we can average the two can be obtained from a fit (lines) to the apparent frequency of the clock
frequencies to obtain an accurate measure of the frequency of the central transition as a function of time (data points). In the first case, with the large
feature. Only the spectroscopic data for the final experiment (black line) is drift intentionally applied to the laser frequency, we obtain a drift of 5.2 kHz s−1
shown; for all other datasets, the Gaussian peak fits to the sidebands (red and (at 674 nm) from the fit (green line). After a few iterations of applying a
blue curves) are shown with progressively darkening colour to illustrate the correction and measuring the resulting drift, the drift is driven down to 17 Hz s−1
movement of these features over time. For these data, an intentional linear (blue line). d, Integrated clock correction signal applied to the laser to keep the
drift of 5 kHz s−1 (at 674 nm; equivalently 2.5 kHz s−1 at 1,348 nm) was applied to frequency resonant with the atom’s transition. In this case, we use a simplified
demonstrate the efficacy of this method in cases of high drift, as in the initial clock protocol with an interrogation time of τ = 100 μs and no interleaving.
few points shown in Extended Data Fig. 3. b, Rabi spectroscopy of the clock
Article
Extended Data Fig. 5 | BCS laser 88Sr+ ion clock. Measured interleaved clock
performance comprising a BCS laser locked to a 88Sr+ ion operating with 1-ms
interrogation time (blue points). The effective dead time is 4.7 ms. The blue
points represent the frequency noise at a selection of averaging times, and the
vertical blue bars indicate 1σ error. A fit (dashed line) to the data yields a
stability of 3.1 × 10−14/ τ , which is slightly lower than the same clock operated
with a SBS laser.
Extended Data Fig. 6 | Schematic of interleaved clock protocol. A pictorial depending on the number of photons collected, the state of the ion is
representation of the interleaved clock procedure is shown. Here the Doppler determined, and the frequency of the clock is either increased or decreased. As
segments represent the 700-μs duration in which the ion is Doppler cooled. discussed in the text, two separate clock signals, f(1) and f(2), are maintained;
During the OP segments, the ion undergoes 450 μs of optical pumping in order here these are indicated as Clock 1 and Clock 2. While the frequency of either
to prepare the electron in the lower level of the clock transition. The clock is updated, the experiment begins to prepare the state for the next
‘Interrogate’ segments are each 1 ms of interrogation time, bounded by measurement, as indicated by the black arrows. Each of these clocks is sensitive
composite π/2 pulses. Last, the ‘Detect’ segments are 700 μs of detection time, to laser frequency fluctuation only during the 1 ms interrogation period of the
during which the photons emitted by the ion are detected on a photomultiplier total 5.7 ms cycle time; during all other times, the frequency of the laser must
tube and counted by our timing controller. During the ‘Update’ segment, and stay within the capture range of the lock.
Article
The Kelvin equation predicts that capillaries become spontaneously scale, the Kelvin equation is usually modified to account for ‘wetting
filled with water at the relative humidity films’ that are adsorbed on internal surfaces before the condensation
transition and effectively narrow the capillaries. For the smallest capil-
RHK = exp(−2σ /kBTdρN) (1) laries, the thickness of the wetting films was used as a free parameter. In
the real world, pores, cracks and cavities obviously do not terminate at
where σ ≈ 73 mJ m−2 is the surface tension of water at room tempera- the scale of several nanometres but extend even below 1 nm or 2σ/kBTρN,
ture T, ρN ≈ 3.3 × 1028 m−3 is the number density of water, kB is the Boltz- the fact that makes condensation phenomena omnipresent under ambi-
mann constant and d is the diameter of the meniscus curvature. For a ent conditions. The latter scale is comparable to the diameter of water
two-dimensional (2D) confinement created by parallel walls separated molecules, which makes it challenging to study experimentally because
by a distance h, d = h/cos θ where θ is the contact angle of water on the of difficulties in creating the required atomic-scale confinement1,10,12,
walls’ material. For capillary condensation to occur at relative humid- the varying thickness of wetting films1,2,7–13,17 and huge capillary pres-
ity (RH) considerably below 100%, equation (1) dictates that d must be sures that can cause considerable deformations13,23–25. As for theory,
comparable to 2σ/kBTρN ≈ 1.1 nm. For example, under typical ambient the Kelvin equation is also believed to reach its applicability limit for
RH of 40–50%, water is expected to condense in slits with h < 1.5 nm and confinement containing a few molecular layers because, at this smallest
cylindrical pores with diameters <3 nm, if θ is close to zero. Even stronger scale, the properties of water notably change2,3,12,15,16 and the description
confinement is required for capillaries involving less hydrophilic materi- in terms of homogeneous macroscopic thermodynamics becomes
als. So far, a broad consensus has been reached that the Kelvin equation questionable1–4,16–20, leaving aside the fact that such quantities as d and
remains accurate for menisci with d ≥ 8 nm (refs. 1–4,6–11) and can also θ in equation (1) can no longer be defined1–3,18–20.
describe condensation phenomena in hydrophilic pores as small as The capillary devices that we studied are shown schematically in
4 nm in diameter12–14. To achieve agreement with the experiments at this Fig. 1a. Their most important part is atomically flat 2D channels made
1
National Graphene Institute, University of Manchester, Manchester, UK. 2Department of Physics and Astronomy, University of Manchester, Manchester, UK. 3School of Materials, University of
Manchester, Manchester, UK. 4Key Laboratory of Advanced Technologies of Materials, School of Materials Science and Engineering, Southwest Jiaotong University, Chengdu, China. 5Chinese
Academy of Sciences Key Laboratory of Mechanical Behavior and Design of Materials, Department of Modern Mechanics, University of Science and Technology of China, Hefei, China.
✉e-mail: qian.yang-2@manchester.ac.uk; wangfc@ustc.edu.cn; geim@manchester.ac.uk
Top crystal
Four graphene layers
Graphene spacers
20 nm
l
sta
cry
ne
H e
a
m
r
mb
tto
e 0
me
Na
Bo
w
fer
Si N
wa
Water
Si
molecules
Chamber with
controlled humidity 50%
3Å
60%
–4
500 nm
40 60 80 100
Relative humidity (%)
Fig. 1 | Atomic-scale capillaries and water condensation inside. a, Schematic (dark-to-bright scale, 40 Å). e, Sagging depth δ as a function of RH for a graphite
of the capillary devices studied here. b, Cross-sectional imaging of a four-layer capillary with N = 4. Coloured symbols, AFM measurements. The grey symbol
graphite capillary by scanning transmission electron microscopy (STEM). The with error bars indicates our experimental accuracy. The two solid curves in
top layer was more than 100 nm thick in this case. c, d, AFM imaging of the same orange indicate the constant sagging δ0 below the condensation transition and
mica capillary (N = 11) exposed to 30% and 95% relative humidity, respectively. the ln(RH) dependence above it. The transition is marked by the dashed vertical
In the dry state, the top crystal sagged by ∼5 Å, but it became flat at high RH, as line. Inset, AFM profiles (averaged over 100 nm along the channel) of the top
illustrated in the corresponding schematics above the images. The black crystal for several values of RH. All the AFM measurements were carried out in
dotted lines indicate the edge of the top crystal (compare with a). In the upper the non-contact PeakForce mode (Methods, ‘AFM topography under
part of the AFM images, the colour scale is given by the observed sagging (grey controlled humidity’).
curves). The bottom part shows graphene spacers without the top crystal cover
by van der Waals (vdW) assembly following the fabrication procedures transition’), these two parameters define the stiffness of the top crystal
described in the Methods. In brief, two atomically flat crystals were and, hence, how deeply it bends inwards. We found that, for w ≈ 150 nm,
exfoliated from bulk muscovite mica or graphite to become the top and the top crystal should be ∼50–70 nm thick to exhibit a sagging depth δ
bottom walls of our capillaries. Separately, narrow strips of multilayer of several ångströms. If either w or H were changed only by a factor of 2,
graphene were fabricated to serve as spacers between the two mica the strong dependence δ ∝ w4/H3 resulted in either collapsed channels
or graphite crystals. Stacking the crystals and spacers on top of each (the top crystal attached to the bottom one) or such a small δ (<1 Å)
other resulted in the 2D channels shown in Fig. 1 and Extended Data that the condensation transition was impossible to discern by AFM.
Fig. 1. We used graphene spacers between N = 2 and about 10 layers The capillaries studied here were typically 5–10 μm long.
thick so that the capillaries had the designated height Na (see Fig. 1a), As shown in Fig. 1a and Extended Data Fig. 1c, our capillary devices
where a ≈ 3.35 Å is the effective thickness of monolayer graphene26,27. were assembled on top of a silicon nitride membrane. It had a rectangu-
Examples of transmission electron microscopy imaging of our capillar- lar opening that was extended into the bottom crystal by dry etching.
ies are provided in Fig. 1b and Extended Data Fig. 1d. Mica and graphite The Si chip supporting the entire assembly was used to separate two
were chosen as archetypal strongly and weakly hydrophilic materials. miniature gas chambers that were integrated into an AFM set-up as
Their contact angles are known to be in the range of 0–20° and 55–85°, shown in Extended Data Fig. 2a. The bottom chamber provided variable
respectively16,28,29. For surfaces exposed to air, θ is close to the above humidity so that one entrance of the 2D capillaries was exposed to a
upper bounds28,29 (Methods). chosen RH. The opposite entrance was facing the top chamber, which
To detect RH at which capillary condensation occurred in the 2D cap- enclosed the AFM scanning head and was usually kept at low humidity.
illaries described above, we exploited the fact26,30 that the suspended The two-chamber configuration allowed us to avoid the influence of
thin crystals exhibited noticeable sagging caused by their vdW adhe- RH on measurements of the top crystal’s topography (for example, no
sion to sidewalls (Fig. 1c). In our experiments we found that, when the condensation occurred at the AFM tip during scanning)31. Examples of
capillaries became filled with water, the sagging depth δ diminished AFM imaging for mica and graphite devices are given in Fig. 1c, d and
(Fig. 1d), presumably because intercalating water molecules ‘screen’ the in Extended Data Fig. 3, respectively. They reveal pronounced sagging
adhesion27,30. To make the resulting changes in δ detectable by atomic under dry conditions, which disappeared in high humidity. Typical
force microscopy (AFM), it was important to choose the thickness H evolution of the top crystal’s profiles with changing RH is shown in
of the top crystal and the channel width w carefully (see Fig. 1a). As Fig. 1e and Extended Data Figs. 4 and 5. In these measurements, we
described in Methods (‘Remnant sagging above the condensation increased RH inside the bottom chamber in steps of 5%, waited for an
80 0 80 T = 85°
Relative humidity RHC and RHK (%)
40 40
20 20
0 0
0 5 10 15 20 25 30 35 0 5 10 15 20
Channel height (Å) Channel height (Å)
Fig. 2 | Condensation transition under extreme 2D confinement. a, Relative minima that correspond to the integer number of water monolayers that can fit
humidity RHC required for water condensation in mica channels of different inside the 2D capillaries. Red symbols (connected by the dashed curve) are the
heights h. Blue circles indicate experimental observations, their size reflects expected behaviour calculated using the oscillating γSL shown in the upper
the 3.5% experimental uncertainty in determining RHC (Methods, ‘AFM curve and equation (2). Black dashed curve, same analysis but assuming fully
topography under controlled humidity’). Two solid curves indicate RHK given flexible capillary walls allowing relaxation into the energy minima at
by equation (1) with bulk water’s characteristics for the range of possible θ for commensurate h. Green filled circles, same analysis but for a finite rigidity of
mica (colour-coded). The upper curve (open black circles), with its own y axis the confining walls. b, Same as a, but for graphite capillaries. The simulated
and the common x axis, shows our MD calculations for changes in γSL caused by curves are for θ ≈ 85°.
restructuring of water inside 2D channels (θ ≈ 10°). The arrows mark the energy
hour for the system to stabilize and then recorded AFM images. The sagging’). Accordingly, to account for the effect of different δ0, Fig. 2
temperature was kept at 294 ± 1 K. For the device in Fig. 1e, the sagging plots RHC as a function of h rather than of N. For mica capillaries, the
remained practically constant for RH ≤ 75% and then exhibited a pro- experimental data are well described by equation (1) using θ and σ of
nounced jump at RHC, which we attribute to the condensation transition bulk water. Because RHK(h) depends little on the exact value of θ for
(another example is shown in Extended Data Fig. 5). Further increase strongly hydrophilic capillaries (Fig. 2a), the comparison of RHC for
in RH led to a gradual decrease in δ such that the top crystal became mica with equation (1) is straightforward. This is not the case for weakly
practically flat at RH > 95% (Fig. 1). The remnant sagging at RH > RHC is hydrophilic graphite, for which relatively small variations in θ lead to
well described by the negative capillary pressure which keeps the top considerable changes in RHK(h) as per equation (1). Nonetheless, the
crystal bent inwards even after water has filled the 2D channels, sup- values of RHC observed for our graphite capillaries fall well within the
pressing the adhesion of the top crystal to the sidewalls. Indeed, for range expected from the Kelvin equation using the contact angles
RH > RHC, δ is expected to evolve proportionally to ln(RH) and reach θ = 80 ± 5°, typical for graphite surfaces under ambient conditions29.
zero at 100% humidity23,24, in agreement with the observed behaviour It is surprising that the macroscopic Kevin equation using the char-
in Fig. 1e and Extended Data Fig. 6 (Methods, ‘Remnant sagging above acteristics of bulk water describes condensation in our mica capillar-
the condensation transition’). If we repeated the same measurements ies so well and also provides qualitative agreement for the graphite
but with decreasing RH, a reverse jump occurred at the same RHC, that capillaries. As mentioned in the introduction, strong discrepancy is
is, the condensation transition was non-hysteretic (Extended Data expected for the ångström-scale confinement where only one or two
Fig. 4a; Methods, ‘Non-hysteretic behaviour of the condensation transi- layers of water fit inside capillaries. Before trying to explain the unex-
tion’). Note, however, that it could take up to several days for capillaries pected agreement between the experiment and the macroscopic Kelvin
exposed to high RH to dry out completely and return to their original equation, we note that RHC values in Fig. 2a are notably lower than the
state (Extended Data Fig. 4b). On the other hand, for measurements RH values required to achieve condensation in the previous studies
with increasing RH, no difference in RHC was observed after either an for d ≥ 8 nm. At our low RH, no continuous wetting layer is expected
hour or days of equilibration. Accordingly, our experiments were nor- even on fresh mica surfaces12,33, and a partial coverage by monolayer
mally carried out with increasing rather than decreasing RH, as in Fig. 1e. water is probably suppressed further by adsorbates from air, which
Figure 2 summarizes our results for the condensation transitions are responsible for the relatively large θ close to 20°. The same con-
observed in mica and graphite 2D capillaries. To allow more accurate sideration about the apparent absence of wetting films also applies
comparison between data collected from different devices, we have for the graphite capillaries in which the wetting transition is even less
accounted for the fact that capillaries with the same N often exhibited likely1,2,28. Second, to avoid the macroscopic variables σ and θ that are
different sagging in their dry state, δ0. For capillaries with large δ0, poorly defined under our extreme confinement, the Kelvin equation
we observed consistently lower RHC than for those with small initial can be rewritten as1,2,18
sagging and same N. Moreover, comparing capillaries with different
N but similar channel heights h = Na − δ0, we found close values of RHC RHK = exp[−2(γSV − γSL)/hkBTρN] (2)
(Extended Data Fig. 5). This implies that it was the narrowest, central
region of the 2D channels that determined the onset of condensation, where γSV and γSL are the surface energies for solid–vapour and solid–
in agreement with general expectations32 (Methods, ‘Effect of initial liquid interfaces, respectively, and γSV − γSL = σ cos θ. The energy γSV
for our mica and graphite capillaries, respectively. Note that the for-
mer value lies in the middle of the contact-angle interval observed for Acknowledgements This work was funded by Lloyd’s Register Foundation, the European
Research Council, Graphene Flagship and the Royal Society. Q.Y. acknowledges support from
mica28 and, importantly, our MD results exhibited little sensitivity to
the Leverhulme Early Career Fellowship, and F.C.W. from the Strategic Priority Research
the exact θ for strongly hydrophilic capillaries, as expected from the Program of the Chinese Academy of Sciences (XDB22040402) and the CAS Youth Innovation
cos θ dependence. Promotion Association.
Having established parameters for the desired contact angles, we
Author contributions A.K.G. suggested the project and directed it together with Q.Y. Q.Y. and
proceeded to another simulation set-up that consisted of two flat P.Z.S. fabricated devices. Q.Y. performed measurements and carried out data analysis with
four-layer graphite sheets immersed in a water box containing 40,000 help from L.F., Y.V.S., S.J.H. and Z.W.Z. F.C.W. provided theoretical support. A.K.G., Q.Y., F.C.W.
and I.V.G. wrote the manuscript. All authors contributed to discussions.
molecules. The dimension of each graphene sheet was 102.2 × 100.9 Å2
whereas the water box was 140.0 × 140.0 Å2 in size, which allowed water Competing interests The authors declare no competing interests.
molecules confined between the rigid graphite sheets to exchange eas-
ily with outside molecules. After an equilibration run of 1.0 ns, the two Additional information
sheets were brought progressively closer in steps of 0.2 Å. Each time Correspondence and requests for materials should be addressed to Q.Y., F.C.W. or A.K.G.
Peer review information Nature thanks Patrick Huber and the other, anonymous, reviewer(s)
the system was equilibrated for 0.1 ns and its total potential energy for their contribution to the peer review of this work.
was calculated for further analysis. Periodic boundary conditions were Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | Nanofabrication of 2D channels. a, Simplified flow
chart for our fabrication procedures. (1) Graphene spacers and the bottom
crystal of either mica or graphite (shown in yellow) were assembled on top of an
oxidized Si wafer. (2) A suspended SiN membrane with a rectangular hole was
prepared separately. (3) The two-layer assembly was transferred from the Si
oxide wafer onto the SiN membrane. The opening was extended through the
assembly by RIE. (4) The top crystal of mica or graphite was placed on top of
graphene spacers. b, AFM micrograph of graphene spacers with N = 5. The
colour scale is given by the height profile (blue curve). c, Optical image of a final
mica device used in our experiments. The bottom mica crystal shows up in
purple on top of the square SiN membrane. Graphene spacers (N = 3) and the
top mica layer are outlined in blue and yellow, respectively. d, Cross-sectional
scanning transmission electron microscopy image of a graphite channel with
N = 2. The blue ticks mark the channel’s edges.
Article
Extended Data Fig. 4 | Non-hysteretic capillary condensation with slow transition observed at 62.5 ± 2.5% for this device. The colour-coded curves
dynamics. a, Sagging profiles for a graphite capillary (N = 4) with increasing show the time evolution towards the original dry state. Note that the sagging
and decreasing RH between 75% and 80%. Black curve, initial dry-state profile. depths δ for such hysteretic loops were highly reproducible but details of
Red curve, RH was increased to 80%. Then, RH was returned to 75% and sagging profiles could differ in different RH cycles. For example, the top
maintained at this humidity. AFM profiles were taken after 4 h, 9 h and 16 h crystal’s adhesion to the right wall was different in the original and final dry
(colour coded). b, The N = 6 graphite capillary was brought from the dry state states, as seen in a (compare black and purple curves). This hysteresis is
(black curve) into the state filled with water and kept for an hour at 95% RH attributed to irreproducible vdW attachments of top crystals to channel
(red). The humidity was then decreased to ∼30%, well below the condensation sidewalls.
Extended Data Fig. 5 | Capillary condensation in 2D channels with different condensation transition occurred between 80% and 85% RH in a and between
initial sagging. a, b, Sagging profiles for two N = 5 graphite capillaries with 70% and 75% in b. The difference in RHC for the same N is attributed to different
different δ0. RH was increased in 5% steps (colour coded). The water h in the two cases.
Article
Accepted: 27 October 2020 Hydroamination of alkenes, the addition of the N–H bond of an amine across an
Published online: 3 November 2020 alkene, is a fundamental, yet challenging, organic transformation that creates an
alkylamine from two abundant chemical feedstocks, alkenes and amines, with full
Check for updates
atom economy1–3. The reaction is particularly important because amines, especially
chiral amines, are prevalent substructures in a wide range of natural products and
drugs. Although extensive efforts have been dedicated to developing catalysts for
hydroamination, the vast majority of alkenes that undergo intermolecular
hydroamination have been limited to conjugated, strained, or terminal alkenes2–4;
only a few examples occur by the direct addition of the N–H bond of amines across
unactivated internal alkenes5–7, including photocatalytic hydroamination8,9, and no
asymmetric intermolecular additions to such alkenes are known. In fact, current
examples of direct, enantioselective intermolecular hydroamination of any type of
unactivated alkene lacking a directing group occur with only moderate
enantioselectivity10–13. Here we report a cationic iridium system that catalyses
intermolecular hydroamination of a range of unactivated, internal alkenes,
including those in both acyclic and cyclic alkenes, to afford chiral amines with
high enantioselectivity. The catalyst contains a phosphine ligand bearing
trimethylsilyl-substituted aryl groups and a triflimide counteranion, and the reaction
design includes 2-amino-6-methylpyridine as the amine to enhance the rates of
multiple steps within the catalytic cycle while serving as an ammonia surrogate.
These design principles point the way to the addition of N–H bonds of other
reagents, as well as O–H and C–H bonds, across unactivated internal alkenes to
streamline the synthesis of functional molecules from basic feedstocks.
Chiral amines are essential structural motifs in numerous active phar- There are considerable challenges facing the development of cata-
maceutical ingredients and in many agrochemicals and materials. They lytic hydroaminations of unactivated alkenes that bear more than one
also serve as chiral catalysts, resolving reagents and chiral auxiliaries14. substituent (Fig. 1a). Both experiments and theoretical studies have
Thus, efficient methods to prepare chiral amines have been long sought. shown that the thermodynamic driving force is weak and the kinetic
Traditional approaches15 include chemical16,17 and enzymatic18 reduc- barrier to combining two nucleophiles is high1,31. Moreover, catalysts
tive amination, hydrogenation19, nucleophilic addition to imines20 and for hydroamination often catalyse undesirable, competing alkene
nucleophilic substitution21,22. However, these methods require starting isomerization, and isomerization is typically faster than addition of
materials containing reactive functionality that is often derived from the N–H bond during many metal-catalysed hydroaminations (Fig. 1b).
feedstock alkenes. Thus, hydroamination of alkenes is the most direct Such relative rates lead to a mixture of isomeric products32,33. Many cata-
method to construct chiral amines from a functional group that is present lysts for alkene hydroamination also promote formation of oxidative
in basic feedstocks and is typically unprotected. Asymmetric additions amination products by β-hydrogen elimination of β-aminoalkylmetal
to conjugated alkenes, such as dienes23–26 and vinylarenes27,28, have been intermediates34,35. These isomerization and oxidative side reactions
reported, but the scope of the direct addition of N–H bonds to more must be suppressed to achieve hydroamination of unactivated internal
common and less reactive unconjugated alkenes is severely limited, and alkenes. Finally, because hydroamination is usually almost ergoneutral,
the enantioselectivities of asymmetric processes are far lower than those isomerization and racemization of the products during the reaction
that would enable applications for the synthesis of chiral amines. Formal can erode regioselectivity and enantioselectivity36.
hydroaminations provide an alternative approach to this problem, but To address these challenges, we modified a neutral iridium complex
the use of silanes with electrophilic aminating reagents29 and even metal containing a bisphosphine ligand, which was previously shown to cata-
reductants30, instead of amines, undermines the atom economy of the lyse the formation of hydroamination and oxidative amination products
hydroamination reaction. Direct N–H additions of unactivated internal from the reaction of terminal alkenes with amides and indoles34,37–40.
alkenes that occur with high enantioselectivity are unknown. We have previously shown that these aminations occur by a mechanism
Department of Chemistry, University of California, Berkeley, CA, USA. 2Division of Chemical Sciences, Lawrence Berkeley National Laboratory, Berkeley, CA, USA. ✉e-mail: jhartwig@berkeley.edu
1
b Me Me NHR
Me Selectivity?
Me
or cat. Me cat., RNH2 Me
NHR Me
Me
Me (Reacts much faster)
How to suppress Me Me Me NHR
isomerization? Me
Me Mixtures Mixtures
c Removable SiMe 3
Enantioselective O
regioselective NH2 SiMe 3
PyR removal
R + Ir+ 2
R N NH R′
R N NH2 R
O P
R′ R′ Ir NTf2–
Ammonia O P
R Primary amines
surrogate
O SiMe3
Design elements: cationic iridium, bidendate amine with adjacent substituent 2
SiMe3
Dissociates R′
readily R R′ Migratory
R H R′ R
LIr + H H insertion C–H RE
N R P H
P P
Ir Ir
R N NH2 N–H OA * P Ir NH NH – PyRNH2 *P NH
*P NH R N NH
2 N
N N R R′
R R R
e–-deficient Ir e–-deficient Ir Rigid iridacyle
Weakens Extra N-coordination accelerates accelerates prevents
coordination facilitates N–H OA I alkene insertion II reductive elimination III β-H elimination
Fig. 1 | Catalytic asymmetric hydroamination of unactivated internal iridium catalyst and an ammonia surrogate based on 2-aminopyridine to
alkenes. a, Long-standing challenges and previous strategies for catalytic achieve asymmetric hydroamination of internal alkenes. OA, oxidative
hydroamination of unactivated internal alkenes. b, Alkene isomerization that addition. RE, reductive elimination.
leads to a mixture of constitutional isomeric products. c, Design of a cationic
comprising oxidative addition of N–H bonds, migratory insertion of a series of heteroaromatic amines and one control arylamine that
alkenes and reductive elimination of C–H bonds34. We envisioned that possess varying structural properties. The hydroamination products
switching the iridium catalyst from neutral to cationic would lead to from these reactions consisted of three isomers (denoted A, B and
the formation of cationic iridium intermediates that would undergo C) that probably resulted from competing alkene isomerization and
migratory insertion of the alkene more rapidly41, a step that has been hydroamination. A larger amount of A reflects a faster rate for direct
shown to be rate-limiting in the reaction of terminal alkenes with the addition of the amine to the starting alkene than alkene isomerization
neutral catalyst34 (Fig. 1c). If binding and insertion of the alkene were before addition.
sufficiently enhanced, then the reaction scope might include internal As shown in Fig. 2a, hydroamination of cis-4-octene in the presence of
alkenes. In addition, we envisioned that coordinating groups adja- [Ir(coe)2Cl]2, (S)-DTBM-SEGPHOS ((S)-(+)-5,5′-bis[di(3,5-di-tert-butyl-
cent to the amine (for example, 2-aminopyridine derivatives) could 4-methoxyphenyl)phosphino]-4,4′-bi-1,3-benzodioxole) and NaBARF
facilitate and make more thermodynamically favourable the initial (sodium tetrakis[3,5-bis(trifluoromethyl)phenyl]borate) occurred
N–H oxidative addition, and such enhancements of this step could be only when 6-substituted 2-aminopyridine derivatives were used as
important because the rate of oxidative addition of electron-deficient the amine source. The hydroamination of 2-amino-6-methylpyridine
metal complexes is generally slower than that of electron-rich ones42. formed amines A, B and C in a combined 73% yield, with A constitut-
If properly placed, the coordinating groups of the amine could also ing 28% of the amine products (defined as A/(A + B + C)). The reaction
form a rigid six-membered iridacycle upon insertion of the alkene; this of 2-amino-6-trifluoromethylpyridine afforded only 5% of isomer A.
geometry could suppress β-hydrogen elimination to form enamines. Other amines tested, including the parent 2-aminopyridine, did not
Such a combination of 2-aminopyridine and a cationic iridium cata- undergo this hydroamination. These results suggest that the substitu-
lyst was evaluated12 for the hydroamination of terminal vinylarenes, ent in the 6-position of the pyridine ring is essential to promote the
but only briefly for the hydroamination of unactivated α-alkenes. Low hydroamination.
enantioselectivities (≤11% enantiomeric excess, e.e.) were observed, To suppress competing alkene isomerization and improve the
and the reactions of unstrained internal alkenes were not examined. reaction yield, we examined catalysts containing a series of bis-
We reasoned that a substituent near the binding group of the amine phosphine ligands and counteranions for hydroamination with
could modulate the strength of the coordination of the pyridine, lead- 2-amino-6-methylpyridine as the amine. Studies of the electronic
ing to potential enhancement of the activity of the iridium catalyst, and steric properties of ligands indicated that the ligands that are
due to weakened coordination (Fig. 1c). Finally, the pyridyl group of less electron-rich (Fig. 2b, entry 2) and more sterically encumbered
the product could be cleaved by known methods to reveal the cor- (Fig. 2b, entries 3–5) than (S)-DTBM-SEGPHOS form catalysts that
responding primary amine43. are more reactive and selective for direct addition to form amine
Figure 2 summarizes our studies on the development of a cati- 1. The observed higher reactivity of catalysts generated from more
onic iridium catalyst and an ammonia surrogate for the asymmetric electron-poor ligands probably results from a greater rate of migratory
hydroamination of unactivated internal alkenes. These experiments insertion of the alkene (see below) into the Ir–N bond of complexes
revealed the effect of the reaction components of the system on the containing L2–L5 lacking the methoxy group than into the Ir–N bond
reaction yield, level of alkene isomerization and enantioselectivity. of the complex ligated by (S)-DTBM-SEGPHOS possessing the p-OMe
To identify a suitable ammonia surrogate for this reaction, we surveyed group44. The higher selectivity from the more hindered ligands could
Me
tBu OMe R
Yield rac-DTB-SEGPHOS
Entry L* X 4-Selectivity d (1+1′+1″) (S)-DTBM-SEGPHOS (R = tBu) (L2)
(L1) rac-TMS-SEGPHOS
1e (S)-DTBM-SEGPHOS (L1) BARF 48% 16% (R = SiMe 3 ) (L3)
2e rac-DTB-SEGPHOS (L2) BARF 50% 42% SiMe3 SiMe3
100 °C
SiMe3 SiMe3
9f (S)-DTBM-SEGPHOS (L1) NTf2 63% 60% (–95% e.e.)
10 (S)-DTBM-SEGPHOS (L1) Cl n/a 0% (R)-TMS-SYNPHOS (L4) (R)-TMS-MeOBIPHEP
(new ligand) (L5)
11f,g (R)-TMS-SYNPHOS (L4) NTf2 89% 78% (97% e.e.)
Fig. 2 | Development of asymmetric hydroamination of unactivated isomerization. aCombined yield. bNo reaction. cDefined as A/(A + B + C).
d
internal alkenes with 2-amino-6-methylpyridine as an ammonia 4-Selectivity defined as 1/(1 + 1′ + 1″). eConditions: 2.5 mol% [Ir(coe)2Cl]2,
surrogate. a, Identification of suitable ammonia surrogates to enable 6 mol% ligand, 6 mol% NaBARF, toluene. fConditions: 5 mol% [L*Ir(COD)]X,
hydroamination of unactivated internal alkenes. b, Identification of reaction toluene. gIn 2-MeTHF.
conditions to achieve asymmetric hydroamination and to suppress alkene
result from a greater sensitivity of the rate of insertion of the alkene unsymmetrical internal alkenes bearing polar functional groups at the
into the metal–amido bond (Fig. 1c, structure II) to steric effects than homoallylic position occurred with synthetically useful regioselectiv-
the rate of insertion into the metal–hydride bond, which probably ity (2:1 to 10:1). These functional groups include phthalimidyl groups
leads to alkene isomerization45,46. Studies with various counterions (7, 8), sulfonamido groups (9), silyloxy groups (10), bis(ethoxycarbonyl)
revealed that reactions with triflimide as the counterion of the cata- methyl groups (11) and (hetero)aryloxy groups (12–14). These groups
lyst occurred with the highest selectivity at high conversion (Fig. 2b, presumably influence the regioselectivity by inductive effects that
entries 6–9). The origin of the effects of the counterions is difficult to are similar to those observed for other classes of functionalization of
ascertain, but an effect is clear. Control experiments showed that the unsymmetrical internal alkenes47–49.
reaction catalysed by a neutral iridium complex under otherwise iden- The reactivity of the system for Z-alkenes enabled the hydroamina-
tical conditions (Fig. 2b, entry 10) resulted in the exclusive formation tion of cyclic alkenes, and these reactions also occurred with high enan-
of oxidative amination products. By combining the chiral ligand that tioselectivity. The combination of [Ir(coe)2Cl]2, (S)-DTBM-SEGPHOS
led to the highest selectivity with the triflimide anion in the form of and NaBARF was used as the catalyst because it was more reactive than
[((R)-TMS-SYNPHOS)Ir(COD)]NTf2, the model hydroamination reaction [((R)-TMS-SYNPHOS)Ir(COD)]NTf2 for the hydroamination of cyclic
formed the 4-aminooctane (1) in high yield; this reaction also occurred alkenes. The cyclic hydrocarbons cyclopentene, cyclohexene, cyclo-
with remarkably high enantioselectivity (Fig. 2b, entry 11). Thus, the heptene and cyclooctene all underwent hydroamination in high yields
substituent on the amine, the new phosphine ligand and the use of a (15–18). The hydroamination of a series of substituted cyclopentenes
triflimide counterion all led to the high activity, chemoselectivity and formed chiral amine products with high enantioselectivity (19–23,
enantioselectivity of the reaction. 90–92% e.e.). The reaction to form the 1,3-substituted cyclopentane 23
With this catalyst and reagent, we examined the scope of alkenes that from the 4-methoxycarbonyl-substituted cyclopentene occurred with
underwent hydroamination (Fig. 3). Both symmetrical internal alkenes high diastereoselectivity for the trans product. In addition, cyclohex-
and unsymmetrical internal alkenes underwent hydroamination with ene derivatives with 3,3-substituents underwent hydroamination to
2-amino-6-methylpyridine. The reactions all proceeded in greater afford two products with regioselectivity of approximately 1:3 (24/24′,
than 90% e.e. Hydroamination of symmetrical alkenes afforded prod- 25/25′). Although the major isomer is achiral, the chiral minor isomer
ucts containing linear alkyl groups (1), aryl-substituted alkyl groups was formed with good to high enantioselectivity. We observed in some
(2), branched alkyl groups (3), silyloxy groups (4), alkoxy groups (5) cases that the hydroaminations of substituted cycloalkenes with rings
and alkoxycarbonyl groups (6) in good to high yields. Reactions of larger than that of cyclopentene occurred to high conversion, but were
NHPyMe
NHPyMe NHPyMe
O Me
F 3C O Me F 3C N O Me
H H
12 13c 14
56%, >99% e.e. 60% (13+13′), 99% e.e. H 60%, >20:1 d.r.
regioselectivity 8:1 regioselectivity 2.8:1
regioselectivity 4:1
CF 3 O
O
bd
NHPyMe NHPyMe NHPyMe NHPyMe NHPyMe O NHPyMe
MeO2C Me
MeO2C Me
O
15 16 17 18 19 20
84% 81% 90% 82% 60%, 92% e.e. 58%, 90% e.e.
trans
NHPyMe R
TBSO AcO NHPyMe NHPyMe NHPyMe NHPyMe
R
MeO2C + R
TBSO AcO
R
R = CO 2Et 24, 24%, 92% e.e. 24′, 48%
21 22 23
64%, 90% e.e. 53%, 90% e.e. 55%, 91% e.e. R = Me 25, 69% e.e. 25′
10:1 d.r. 52% (25+25′), regioselectivity 1:3.5
ce,f O
Me
NH2 NH2 Me Me HN Me
Tf
Me N Me MeO
Me
26 27 O 28
85% GC yield, >93% e.e. 71% yield, >96% e.e. 79% yield
(35% yield isolated as Boc amide)
d
Fig. 3 | Scope of internal alkenes that undergo hydroamination. a, Scope of Conditions: 2.5 mol% [Ir(coe)2Cl]2, 7.5 mol% (S)-DTBM-SEGPHOS, 6 mol%
asymmetric hydroamination of acyclic internal alkenes. b, Scope of NaBARF, 1,4-dioxane, 120 °C. eConditions: PtO2, HCl, H2 (1 atm); NaBH4,
hydroamination of simple cycloalkenes and of asymmetric hydroamination of THF/EtOH. fEnantioselectivities were determined after conversion to the
substituted cyclic alkenes. c, Products from the removal of the 2-(6-methyl) original hydroamination product by palladium-catalysed cross-coupling of the
pyridyl group. a2.5 mol% catalyst. b7.5 mol% catalyst. c20 mol% catalyst. primary amine with 2-bromo-6-methylpyridine.
complicated by competing alkene isomerization, which led to mixtures To understand how the combination of a cationic iridium catalyst
of isomers that were difficult to separate. and 2-amino-6-methylpyridine enabled the hydroamination of unac-
The pyridyl group of the hydroamination products (1, 9) was cleaved tivated internal alkenes, we investigated the reaction mechanism.
by a short sequence that consisted of protonation, hydrogenation The reaction of a substituted cyclopentene with N,N-dideuterio-
and borohydride reduction. The corresponding primary amines 2-amino-6-methylpyridine showed that the addition occurred in
(26, 27) formed in 71–85% yield with little or no erosion in enantio- a syn fashion. This stereochemistry is consistent with a mecha-
meric excess (see Supplementary Information sections VI and X). By nism that involves migratory insertion of an alkene, rather than
the same sequence, hydroamination product 6 was converted to the nucleophilic attack on a metal-bound alkene complex (Fig. 4a).
corresponding δ-lactam (28) in 79% yield. Kinetic experiments showed that the reaction is first-order in
NHPyMe
b Me [(R)-TMS-SYNPHOSIr(COD)]NTf2
+ Me
Me 2-MeTHF, 120 °C Me
Me N NH2 1
4.5 1.7 25
y = 5.7x + 1.4 1.5 y = 2.5x + 0.14 y = 2.6 × 103x – 4.5
1/(Initial rate) (×10–6 s M–1)
3.0 0.9 10
0.7
2.5 5
0.5
2.0 0.3 0
0.1 0.2 0.3 0.4 0.5 0.05 0.15 0.25 0.35 0.45 0.55 0.65 0 0.002 0.004 0.006 0.008 0.010 0.012
1/[Alkene] (M–1) [Amine] (M) [lr] (M)
c 5 mol%
Me [(R)-TMS-SYNPHOSIr(COD)]NTf2
+ +
Me No reaction
Me N NH2 N NH2 2-MeTHF, 120 °C, 48 h
1 equiv. 1 equiv. 10 equiv.
H
H
Ir
Ir
C C
Npy C
Nam
Nam C
Npy
H
P NH
H Ir
P TS-1a TS-2a *
P
Ir
* Me Ir-Npy (Å) 2.14 2.42 Me N
Side view P N
HN Ir-Nam (Å) 2.32 2.14
TS-1a TS-2a
ΔΔG‡ = 0 kcal mol–1 ΔΔ G‡ = 4.1 kcal mol–1
Fig. 4 | Mechanistic study of the hydroamination. a, Deuterium-labelling (S)-TMS-SEGPHOS. The hydride is located trans to the amido ligand in the
experiments. b, Experiments to reveal kinetic orders of each reaction lowest-energy transition state (TS-1a) that leads to the (R)-enantiomer but is
component. c, Competition experiments using 2-amino-6-methylpyridine and located trans to the pyridyl group in the lowest-energy transition state (TS-2a)
2-aminopyridine. d, Transition-state structures of alkene migratory insertion that leads to the (S)-enantiomer. Error bars in b correspond to those in the
computed by DFT. Single-point energies were computed at the M06/6- initial rates that result from errors in the integration of peaks in the gas
311+G(d,p)/SDD/SMD(1,4-dioxane) level of theory with structures optimized at chromatogram.
the B3LYP/6-31G(d)/SDD level. The ligand used for the calculations is
[iridium catalyst], positive-order in [cis-4-octene] and inverse-order reaction, we conducted hydroamination of cis-4-octene with equi-
in [2-amino-6-methylpyridine] (Fig. 4b). These data suggest that a molar amounts of 2-amino-6-methylpyridine and 2-aminopyridine.
molecule of 2-amino-6-methylpyridine dissociates reversibly from Whereas the reaction of 2-amino-6-methylpyridine alone afforded
iridium in the catalyst resting state before rate-limiting insertion of the hydroamination product in high yield, the reaction that contained
the alkene, presumably into the metal–amido bond41. To elucidate both 2-amino-6-methylpyridine and 2-aminopyridine provided neither
why the methyl group of 2-amino-6-methylpyridine is essential in this hydroamination product (Fig. 4c). This result implies that stronger
Additional information
Acknowledgements The enantioselective aspects of the work were supported by the National Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
Institutes of Health under grant R35GM130387 and the catalyst development was supported 2919-z.
by the Director, Office of Science, of the US Department of Energy under contract number Correspondence and requests for materials should be addressed to J.F.H.
DE-AC02-05CH11231. Calculations were performed at the Molecular Graphics and Peer review information Nature thanks the anonymous reviewer(s) for their contribution to the
Computation Facility at UC Berkeley funded by the NIH (S10OD023532). We gratefully peer review of this work.
acknowledge Takasago for gifts of (S)-DTBM-SEGPHOS, and H. Celik for assistance with Reprints and permissions information is available at http://www.nature.com/reprints.
nuclear magnetic resonance (NMR) experiments. Instruments in the College of Chemistry
NMR facility are supported in part by NIH S10OD024998. We thank R. G. Bergman, B. Su and
T. Butcher for discussions. Y.X. thanks Bristol-Myers Squibb for a graduate fellowship,
S. Pedram for supply of NaBARF and D. Small for assistance with DFT calculations.
Article
Quantification of an efficiency–sovereignty
trade-off in climate policy
https://doi.org/10.1038/s41586-020-2982-5 Nico Bauer1 ✉, Christoph Bertram1, Anselm Schultes1, David Klein1, Gunnar Luderer1,2,
Elmar Kriegler1, Alexander Popp1 & Ottmar Edenhofer1,2,3
Received: 13 March 2020
The Paris Agreement (PA) set out ambitious targets for climate change In view of the deep differences in economic, structural and technologi-
stabilization that require a coordinated effort to limit and reduce green- cal development across countries1,12,13 (Fig. 1a–d), international climate
house gas (GHG) emissions. Negotiations of an ambitious international policies have to balance these criteria to achieve the PA targets.
emissions reductions framework building on the PA need to consider
at least three criteria. First, fair effort sharing requires that unfair
outcomes are avoided, such as regressive policy effects that imply Sovereignty versus equity trade-off
higher relative effort by low-income countries. Second, cost efficiency— So far, most analyses of the PA climate targets have applied cost-efficient
minimizing the global aggregate mitigation costs—requires equaliza- policy frameworks, implying uniform carbon prices across regions, at
tion of marginal abatement costs across countries for an incremental least in the medium term. Although the uniform price sets the upper
ton of carbon dioxide (CO2) avoided. Finally, in this study, national sov- bound for the marginal abatement costs of mitigation measures that
ereignty focuses on nation states’ aim to maintain governing control of are undertaken as equal for each region, the resulting relative emis-
economic resources by limiting international transfer payments6,7,10,11. sions reductions and mitigation costs vary across regions (Fig. 1e, f).
Potsdam Institute for Climate Impact Research (PIK), Member of the Leibniz Association, Potsdam, Germany. 2Technical University of Berlin, Berlin, Germany. 3Mercator Institute on Global
1
20
125
10
100
0
15
b 75
Regional emissions reduction
on
(billion tCO2)
10
Emissions
CD
ia
Am atin
d A st
on ng
ca
i es
As
fric
an e Ea
5
ec ormi
eri
OE
om
L
l
dd
f
Re
Mi
0 f
c 12.5
5.0
10
2.5
0
d 4,000
0
(US$ (2010) per tCO2)
Carbon productivity
CD
ia
Am atin
d A st
on ng
ca
ies
As
fric
an e Ea
ec ormi
1,000
eri
OE
om
L
l
dd
f
Re
Mi
1960 1980 2000 2020
Year
Region Models
AIM/CGE IMAGE TIAM
Asia OECD
BET MESSAGE–GLOBIOM WITCH–GLOBIOM
Latin America Reforming economies GCAM POLES
Middle East and Africa GRAPE REMIND–MAgPIE
Fig. 1 | World economic development and emission mitigation projections. (e) and regional mitigation costs relative to GDP (f) in 2040 used by the IPCC for
a–d, Global socioeconomic developments across five world regions: income mitigation scenarios. The boxplots show the median, and the 25% and the 75%
per capita (a), annual CO2 emissions (b), annual CO2 emissions per capita (c) and quartiles; whiskers show 1.5 times the interquartile range. For details on data
carbon productivity (d). e, f, Model results for regional emissions reductions sources and regional aggregates, see Methods and Extended Data Table 1.
Low-income countries with low carbon and energy productivities tend carbon prices in 2030 ranging from nearly zero to more than US$100 per
to reduce emissions by a larger fraction and carry relatively higher tCO2 (refs. 15,21–23). This implies more equitable burden sharing, but
mitigation costs than economies with high carbon and energy produc- fragmented policy implementation compromises cost efficiency22–25.
tivities14,15. In particular, Asian countries show systematically higher In addition, aggregated globally, NDCs fall short of cost-effective emis-
relative mitigation costs and stronger relative emissions reductions sions reductions required in 2030 to achieve the long-term PA climate
than Organisation for Economic Co-operation and Development targets15,26,27.
(OECD) countries. The aggregate region of Middle East and Africa shows So far, studies on effort sharing have assumed full integration of
higher mitigation costs, but weaker emissions reduction rates because an international cap-and-trade system and varied permit allocations,
of high energy and low carbon intensity. Therefore, without transfers, which either results in regressive outcomes or implies large transfers.
cost-efficient international climate policies tend to cause regressive However, in these studies, the permit allocation does not affect the
income effects that deepen economic inequality. cost-optimal energy and land-use transformations. The evaluation of
The Kyoto Protocol negotiations in 1998 inspired research into cost real-world fragmented policy regimes such as the NDCs shows that
reductions by trading emissions permits16. Various subsequent stud- they fall short of required emissions reductions at relatively high
ies investigated the trade-off between fair effort sharing and national global costs that are not shared equitably15. This study investigates
sovereignty of a broad range of initial permit allocations derived from the trade-off between cost efficiency and sovereignty by exploring
different equity and sovereignty principles17–19. In cap-and-trade sys- the policy options between idealized cap-and-trade systems and frag-
tems, permit allocations following the equal per-capita allocation mented systems of real-world policies while pursuing the PA to limit
principle imply international transfers2–6 that can reach US$6 trillion warming to well below 2 °C above pre-industrial levels with equitable
present value globally until 21004. The equal-effort-sharing criterion effort sharing.
can require even larger transfers to avoid regressive income effects2.
In general, changes in permit allocation do not affect the modelled
energy and land-use sector transformation because the carbon price REMIND–MAgPIE and mitigation policies
does not change in a globally integrated cap-and-trade system if a given We develop a policy analysis framework that consistently varies
number of permits is allocated differently20. regional policy strength and international transfers in scenarios that
The PA, with its system of nationally determined contributions comply with the equal-effort-sharing principle and the well below 2 °C
(NDCs), moved regionally fragmented climate policies to the centre target. The policy framework is evaluated using the long-term multi-
of analysis. The policy ambition of the NDCs corresponds to regional regional integrated assessment model (IAM) REMIND–MAgPIE, which
15,000 Region
Reforming economies
300
Middle East and North Africa
pe
an
nd
ca
ina
ia
sia
hA a
a
ies
e
es
ted
t tr iform
h t or m
r
r
ic
fric
rag
Ind
sfe
sfe
ta t
er i
p
uro
ala
rA
om
d N n Afr
Ch
Ja
tia
ave
wit Unif
an
ran
Am
dS
Ze
he
rE
Un
on
en
ara
ort
Ot
ite
rld
he
ec
w
fer
tin
ou
Ne
ah
Un
Ot
Wo
Dif
La
ng
h
b-S
lia,
wit
an
mi
tra
for
Su
st
Ea
us
Re
,A
le
dd
da
Mi
na
Ca
Fig. 2 | Carbon pricing and regional effort. a, Regional CO2 prices in 2030 on a which two out of three criteria are met in each scenario. The differences
log scale. The world average depicts the global average price weighted with the between components of the uniform pricing schemes represent regional
regional CO2 emissions in 2030. b, Total mitigation effort including discounted transfers, showing that the six regions at the bottom are donors and the six
income losses in NPV and eventual transfers compared with a no-policy regions at the top are recipients.
baseline for 2020–2100; the discount rate is 5% yr−1. The table summarizes
links the regional model of investment and development REMIND with mitigation costs by US$2.6 trillion or 21%. Owing to the nonlinearly
the model of agricultural production and its impact on the environment shaped regional mitigation cost functions32, the relative increase is
MAgPIE67. This model has been used to analyse transition pathways28, much lower than the quadrupling of the average carbon price.
effort-sharing schemes4 and NDCs21,25. REMIND–MAgPIE integrates
the macroeconomy with the land use and the energy sector in a gen-
eral equilibrium framework including energy and food trade29. The Deriving a trade-off curve
baseline scenario quantifies a middle-of-the-road scenario (Shared Each of the three corner solutions leads to extreme outcomes in
Socioeconomic Pathway 2 (SSP2)) with medium economic growth and regional mitigation costs, transfers or carbon price differentiation.
partial income convergence (Extended Data Fig. 3)29,30. For instance, Therefore, we explore the trade-off between sovereignty and cost
measured in market exchange rates, US per-capita income in 2020 efficiency by gradually compressing the carbon price spread, moving
is 28 times that of India declining to 13 by 2050 and 5 by 2100. In this from the no-transfer case with differentiated carbon prices towards
study, a carbon budget of 1,300 GtCO2 for the period 2011–2100 is the uniform carbon price case with transfers. We apply an exponential
assumed31. Short-term policies derived from NDCs are applied in 2020; function to adjust pairs of regional prices (pi/pj)α of the no-transfer
carbon prices thereafter increase by 5% yr−1 until 2060 and continue case for regions i and j. The compression parameter α is varied
growing linearly thereafter. The flattened time profile limits overshoot between zero (uniform prices case) and one (differentiated prices
of the carbon budget. Regional variations in climate policy strength case). The compressed carbon price set is used in REMIND–MAgPIE
are implemented by varying carbon prices; international transfers and jointly rescaled to meet the carbon budget, which leads to vari-
are implemented as direct payments. Benefits from avoided climate ations of global mitigation costs and global residual transfers. This
damages and adaptation costs are not considered in these scenarios approach does not necessarily determine the frontier of smallest
(see Methods). efficiency losses for varying transfer volumes, but derives a trade-off
curve that is economically feasible and complies with the PA climate
objective and the equal-effort-sharing criterion (see Methods for
Corner solutions of a policy trilemma detail).
Figure 2 shows that only two of the three criteria of cost efficiency, fair-
ness and sovereignty can be fulfilled simultaneously. The cost-efficient
policy with a uniform carbon price (US$56 per tCO2 in 2030) but no Relaxing uniform carbon pricing policies
transfers leads to mitigation costs of US$12.6 trillion that are distrib- The resulting trade-off curve is strongly nonlinear (Fig. 3a). Starting
uted regressively across regions (0.30% relative income loss for the at the solution with uniform carbon prices, the steep drop in trans-
European Union (EU) and 3.4% for India). A total of US$4.4 trillion in fers follows directly from the cost-efficient solution that requires
international transfers is required to neutralize the regressive income equalization of marginal abatement costs. If non-OECD economies
effects and achieve equal effort sharing. Alternatively, achieving equal lower their carbon taxes, their mitigation costs decrease by an amount
effort sharing without transfers requires regionally differentiated car- slightly smaller than the cost increase in OECD countries that need
bon prices. The ratio between the highest and lowest price is 130. The to impose higher carbon taxes to balance the global carbon budget.
global average carbon price weighted with regional emissions increases Hence, relatively large transfer reductions cause only small inefficien-
to US$225 per tCO2, with a weighted standard deviation of US$400 per cies, if the carbon tax spread remains sufficiently small (Methods,
tCO2. These deviations from uniform carbon pricing increase global Extended Data Fig. 1).
68;49
2,000
77;72
88;102
1,000 Differentiated
106;146
133;203
Uniform 170;286
195;339
without transfer 216;383
0 225;402
b c
2,000 Coal power capacity shutdown exceeds 50%
OECD
Cumulative CO2 emissions 2020–2100 (GtCO2)
Non-OECD
Net emissions
Non-OECD Electricity share exceeds 30%
OECD
OECD
Non-OECD
1,000 Carbon removal using BECCS exceeding 250 Mt yr–1
Emissions and removals OECD
D Non-OECD
EC
n -O CD
No OE Fossil primary energy share falls below 50%
OECD
Emissions deforestation
Non-OECD
Emissions FFI
0
Removals afforestation Total CO2 emissions turn negative
Removals BECCS OECD
Removals DAC Non-OECD
Fig. 3 | Sovereignty versus cost-efficiency trade-off and consequences of countries differentiated by emissions sources and carbon removals. FFI, fossil
differentiated carbon prices. a, The trade-off curve, including the three fuel and industry; DAC, direct air capture; BECCS, bioenergy with carbon
corner solutions (marked with red circles). The numbers indicated the 2030 capture and sequestration. c, Different timing of mitigation measures in OECD
global average carbon price and the standard deviation using the regional CO2 and non-OECD regions. ‘Partially differentiated’ is the case with an average
emissions as weights. The costs and transfers are NPVs for the period 2020– carbon price of US$63.3 per tCO2. In some scenarios, the threshold is not
2100. The time path of transfers is shown in Extended Data Fig. 5 and discussed reached before 2100 and, therefore, no marker is shown.
in Methods. b, The cumulative net carbon emissions in OECD and non-OECD
Fig. 5). As the carbon tax spread grows, changes in fossil fuel use become
Relaxing the no-transfer constraint exhausted, while carbon removal is intensified in OECD countries.
Starting at the opposite end of the trade-off curve with full price differ- Hence, untapped abatement potentials in low-carbon-price countries
entiation, the nonlinear shape highlights the effect of limited transfers. are largely offset by costly abatement options in high-income countries
Reducing the price spread by three quarters lessens the global ineffi- as their ability to reduce fossil fuel use is exhausted. This interacts with
ciency by 56%, but requires only 21% of transfers of the uniform price land market distortions: emissions in non-OECD countries increase
case. Owing to the strongly increasing mitigation costs, the marginal due to land-use extensification by deforestation to export bioenergy,
and total costs for OECD countries decline rapidly as their emissions whereas OECD countries reduce agricultural land for afforestation
reductions are relaxed, whereas increments of non-OCED countries while importing biomass that is used in the energy sector combined
are smaller, as their emissions constraints need to be tightened. Hence, with carbon capture and storage (BECCS)8,9. Therefore, market distor-
starting from the solution of full price differentiation makes transfers tions caused by regional policy differentiation risk detrimental impacts
highly effective with respect to reducing inefficiency while maintaining on environmental sustainability. See also Extended Data Figs. 5, 6.
the equal-effort-sharing criterion. The efficiency–sovereignty trade-off also interacts with the timing
of mitigation measures across regions. Figure 3c shows for selected
indicators of the energy sector transformation the year a threshold is
Unintended effects of distorted markets reached. Under uniform carbon pricing, mitigation measures proceed
The solution to the efficiency–sovereignty trade-off has broader impli- at a similar speed in different regions33. For example, BECCS deploy-
cations. Increasingly differentiated carbon prices lead to a reallocation ment exceeds 250 MtCO2 yr−1 shortly after 2040 in OECD and non-OECD
of regional emissions and multiple market distortions. The regional regions. Spreading carbon prices leads OECD countries to front-load
reallocation of gross emissions and fossil fuel market distortions are and tighten the timing of measures, whereas non-OECD countries delay
largest for relatively small carbon price spreads (Fig. 3b, Extended Data and stretch the timing. For instance, non-OECD countries exceed the
Extended Data Fig. 1 | Graphical illustration of the distributional effects equal costs; this case is not depicted explicitly. The case ‘Hybrid’ differentiates
between advanced economy A and developing economy B for different carbon prices to reduce T. The change of global mitigation cost is only the
policy frameworks characterized by different marginal abatement cost difference between the regional mitigation costs ΔAC a − ΔACb (indicated by the
functions fA and fB . The case ‘Uniform price w/o transfers’ with carbon prices red triangles), whereas the changes of transfers are represented by the orange
equal to p in both regions implies different mitigation costs. In the case rectangles pΔR. As long as the differentiation of prices is relatively small, the
‘Uniform price w/ transfers’ these differences are neutralized by transfers T. decline of transfers exceeds the increase in global mitigation costs.
Alternatively, in the case ‘No transfer’ the differentiation of policies leads to
Extended Data Fig. 2 | Illustration of the exponential compression regions and the light grey line highlights the compression for Latin America
function. The x axis shows the 2030 carbon prices in the full differentiation and the EU. We note that the set of compressed carbon prices is scaled to
case (see Fig. 2a). The example applies the parameter α = 0.5 to the compression comply with the global carbon budget (that is, the relative differences of the
∼min(p
function pr = p ∼ /p
∼min )α that has been introduced in the Methods. The y axis carbon price spread remain constant).
r
shows the carbon prices after compression. The figure shows a subset of
Article
Extended Data Fig. 3 | Socioeconomic drivers and CO2 emissions in the c, d, Energy and industry CO2 emissions (c) as well as total CO2 emissions (d).
no-policy baseline scenario. a, b, Population (a) and economic growth (b) See ‘Data availability’ section for more details.
from history 1990–2015 and in the SSP2 baseline scenario 2015–210030,65.
Extended Data Fig. 4 | Time path of transfers in the default case with
uniform carbon prices. The transfers are expressed as percentages of GDP in
the OECD and the non-OECD regions. The dashed line serves as a point of
orientation. It represents the share of the US$100 billion relative to the OECD’s
GDP in 2020.
Article
Extended Data Fig. 5 | Effect of carbon price differentiation on primary and energy use increases. Second, OECD countries mostly reduce residual oil and
final energy use. a, b, Changes in the global energy mixes distinguished by gas consumption, but non-OECD countries mostly increase the use of coal;
energy carriers and regions. The figure depicts differences compared with the therefore, the total consumption of coal increases, whereas the global use of oil
uniform carbon tax case for the 1,300 GtCO2 carbon budget. Primary energy (a) and gas decreases. Third, the total use of biomass increases due to increasing
is measured according to the direct equivalence principle; final energy (b) is demand in OCED countries. Fourth, OECD countries accelerate modernization
measured as delivered to final consumer. This means that fossil fuels, biomass of final energy use by mainly reducing the use of liquids and gases, but
and geothermal energy are measured in primary energy input, whereas increasing electricity and hydrogen. Finally, non-OECD countries delay
renewables (hydro, wind and solar) as well as nuclear energy are measured by modernization of energy use by mainly increasing the use of solids, liquids and
their electricity output. Notable results are as follows. First, the total amount of gases with corresponding implications for air pollution and so on.
Extended Data Fig. 6 | Effect of carbon price differentiation on land-use exported to OECD countries. Moreover, OECD countries increase investments
change and investments. a, b, Changes in global land use (a) and investment in the energy sector substantially to facilitate the transition to low-carbon
and regions (b). The figure depicts differences compared with the uniform technologies and to invest into carbon-removal technologies (which are
carbon tax case for the 1,300 GtCO2 carbon budget. The two regions show counted as part of the energy sector). The higher OECD investments crowd-out
opposite changes in the variables; for example, OECD countries convert the macroeconomic investments in OECD countries and energy sector
agricultural land into forests to remove carbon by afforestation, whereas investments in non-OECD countries.
non-OECD countries convert forests into cropland to grow biomass that is
Article
Extended Data Fig. 7 | Sensitivity analysis of the trade-off curve. a, The technologies that rely on underground geological storage of CO2 (that is, CCS
sensitivity of the ban on bioenergy trade. b, The sensitivity with respect to the including direct air capture). d, The application of the linear compression
carbon budget. c, The variation of the maximum annual retirement rate of function; in this sensitivity analysis the case of uniform taxes and the
fossil-fuelled infrastructure from 9% to 6% and the delayed availability of no-transfer case are identical.
Extended Data Fig. 8 | Regional aggregates used in REMIND and MAgPIE. Regions and countries belonging to the OECD region are coloured in blue tones.
See ‘Data availability’ section for more details. World map based on rworldmap package66.
Article
Extended Data Table 1 | Overview of data sources for model comparisons
Scenario data in Fig. 1e, f were taken from the IPCC databases for the Fifth Assessment Report (AR5) and the Special Report on 1.5 °C warming (SR15)64. The carbon budgets are computed for the
period 2011–2100. Figure 1e, f shows relative differences between the ‘Policy case’ and the ‘Base case’.
Extended Data Table 2 | Overview of selected sensitivity cases
The mitigation costs and transfers are NPVs applying a 5% annual discount rate. The inefficiency is the difference between the mitigation costs in the differentiated and the uniform case. The
bold numbers are quoted in the text. The carbon prices in the default case are shown in Fig. 2a. The income growth sensitivity case uses per-capita GDP and population projections of SSP1 (but
the demands for final energy and food are the same as in the default case, which relies on SSP2). Note that in case of faster income growth and convergence (right-hand column), the carbon
price in the uniform case as well as the absolute mitigation costs increase, but the ratio of transfers to mitigation costs decreases. If the metric of consumption losses is used to measure effort,
the losses of fossil fuel-rich countries are more severe in the uniform price case and, consequently, in the differentiated case the carbon prices in these countries are lower.
Article
https://doi.org/10.1038/s41586-020-2920-6 Brian Leung1,2 ✉, Anna L. Hargreaves1, Dan A. Greenberg3, Brian McGill4,5, Maria Dornelas6
& Robin Freeman7
Received: 28 January 2020
Rapid global change is threatening species across the globe1. The quan- one population declined by 99%. Even if a second population increased
tification of biodiversity trends is important to assess whether current 50-fold or 393 populations increased by 1% (that is, a large net increase),
investment is slowing or reversing declines, and to identify regions a geometric mean would show a catastrophic 50% decline. Thus, a
and taxa of concern. Although distilling disparate population trends geometric mean decline of 50% could arise from substantial, wide-
into a single global index can focus attention on biodiversity trends2–4, spread loss that is occurring across many populations (we term this the
simple metrics can distort the full picture. ‘catastrophic declines’ hypothesis) or from a few extremely declining
Estimates of global biodiversity trends vary depending on their populations (we term this the ‘clustered declines’ hypothesis). Both
data and mathematical model. The most apocalyptic models gather scenarios involve important conservation issues, but suggest vastly
extensive press coverage, even when based on controversial data (for different underlying problems and require different mitigation strate-
example, ‘biological annihilation’5, which described trend estimates gies14, thus distinguishing between them is of real-world importance.
based largely on expert opinion; or ‘insect Armageddon’, which is based We derive a Bayesian hierarchical mixture (BHM) model to distin-
on data disputed by the original collectors6). However, even analyses of guish between the catastrophic and clustered declines hypotheses. The
the best available data reach conflicting conclusions. An analysis of a model statistically separates population trends into extreme declines,
global dataset of abundance time series of vertebrates estimated that, typical trends and extreme increases (Fig. 1), while accounting for
on average, vertebrate populations have declined by more than 50% time-series size, within-population fluctuations, number of popula-
since 1970 (Living Planet Index2 (LPI)); however, other global analyses tions and among-population variance. We test declines in abundance
found that the mean population size7,8 and species richness9,10 have for more than 14,000 vertebrate populations (from the LPI)15. We chose
remained stable over similar timeframes. Explanations for the discrep- LPI data because of its large scope, because the data and analytical
ancies have been proposed8,11–13, but not resolved. details were publicly available, and because previous analyses of these
One crucial consideration is that summary indices may be easily data suggested widespread, global declines2.
misinterpreted. Calculating the geometric mean across populations We first examined whether the previous estimate2 of a mean decline
is the most common and straightforward approach, but is strongly of more than 50% was sensitive to extreme populations: robust declines
influenced by extremes. To illustrate, imagine an ecosystem in which would support the catastrophic declines hypothesis, whereas high
1
Department of Biology, McGill University, Montreal, Quebec, Canada. 2Bieler School of Environment, McGill University, Montreal, Quebec, Canada. 3Department of Biological Sciences, Simon
Fraser University, Burnaby, British Columbia, Canada. 4School of Biology and Ecology, University of Maine, Orono, ME, USA. 5Mitchell Center for Sustainability Solutions, University of Maine,
Orono, ME, USA. 6Centre for Biological Diversity, University of St Andrews, St Andrews, UK. 7Indicators and Assessments Unit, Institute of Zoology, Zoological Society of London, London, UK.
✉e-mail: brian.leung2@mcgill.ca
–0.1 0 0.1
b 1.0
0.6
–0.1 0 0.1
d
Neotropical
Indo-Malayan
Afrotropical
b Nearctic Palearctic
*
Atlantic tropical
and subtropical *
South temperate
and Antarctic
Fig. 3 | Population trends by taxonomic groups and realms. a, The terrestrial orange, strong non-significant declines; green, strong non-significant
realm. b,The freshwater realm. c, The marine realm. Red and blue asterisks increases; yellow, weak changes). Maps were created using ArcGIS software by
indicate the occurrence of extremely declining clusters (16 systems) and Esri (ArcGIS and ArcMap are the intellectual property of Esri and are used
increasing clusters (8 systems), respectively. Distributions show the primary herein under licence. Copyright © Esri. All rights reserved. For more
cluster in each system. Red, significant declines; blue, significant increases; information about Esri software, please visit https://www.esri.com).
Table 4). Although size-specific models included fewer populations, overall growth rate of primary clusters was close to zero: θ1 = −0.00035,
especially for smaller species, the number of clusters was not uniformly corresponding to around 1.7% loss over 50 years, given a constant
lower (as might be expected given a reduction in power); therefore, the rate across populations and time (Fig. 5). In addition, in contrast to
differential occurrence of extremely declining versus increasing clusters extreme clusters, primary cluster trends were robust to time-series
suggests that large animals are more vulnerable to extreme declines. size, as excluding series with fewer than 10 data points yielded a similar
overall global trend (θ1 = 0.0043) (Extended Data Fig. 3).
Although the global BHM model reveals considerably more nuance
Evidence for catastrophic declines than a geometric mean index, analysing across systems still masked
In contrast to the extreme clusters, the primary clusters accounted for important patterns. When systems were analysed separately (Supple-
the vast majority (98.6%) of populations across the 57 LPI systems. The mentary Table 2), primary population clusters were strongly declining
Frequency
log(mean growth rate)
−3
–0.1 0 0.1
log(annual growth rate)
Fig. 5 | Populations in the primary clusters across all systems, after removal
of extreme clusters. The primary cluster of each system is unimodal, but
because systems are experiencing decline (or growth) heterogeneously,
−6
plotting distributions across systems shows multimodality. Histograms show
significantly declining systems (red), strongly but not significantly declining
0 10 20 30 40
systems (orange) and weak changes or increases (yellow). Vertical lines show
Number of data points in time series thresholds for strongly declining (−0.015) and strongly increasing (+0.015)
growth rates, corresponding to an approximate 50% loss or a doubling (over 50
Fig. 4 | Effect of the size of the time series. The number of data points in the
years), respectively. Distributions of primary clusters were calculated based on
time series versus the mean log-transformed value of the geometric mean
the mean and s.d. from the hierarchical model, and using the system-specific
growth rate.
weights to adjust for species richness.
Extended Data Fig. 1 | Theoretical analyses of BHM model. The p–p plots the fraction in each cluster ( f 1, f 2 = 1 − f 1). The 1:1 line is the theoretic
show that the posterior distributions for each estimated parameter are expectation, indicating that the true parameter value falls below the 0.01
unbiased and largely follow a 1:1 line for each hyper parameter (σ, τ) as well as quantile 1% of the time, the 0.02 quantile 2% of the time, and so on.
Extended Data Fig. 2 | Sensitivity analyses of primary cluster trends. The
trends of the primary clusters (θ1), for the main analysis (x axis) versus the
sensitivity analysis (y axis) for the threshold for extreme clusters (top) and the
offset when n = 0 was observed (bottom).
Article
Extended Data Fig. 3 | Effect of small time series on primary cluster trends.
Each point represents a trend estimate for the primary cluster of a system, with
the full dataset (x axis) versus data excluding time series with less than 10 data
points (y axis). The red dot indicates the freshwater Indo-Pacific mammals,
which was reduced from 22 populations (full) to 2 populations (only data with
at least 10 data points).
Extended Data Fig. 4 | Mean trends of primary clusters across systems
calculated using the BHM model. Top, all species (14,700 populations).
Middle, only large species (9,596 populations). Bottom, only small species
(5,103 populations). The small species appear to be declining more than large
species, although this finding needs to be interpreted with caution, as most
primary distributions did not significantly deviate from zero for small species.
Article
Extended Data Fig. 5 | Histograms of observed growth rates and output of Indo-Pacific birds) or only one direction (for example, terrestrial Neotropical
the BHM model for systems 1–16. Blue line, primary cluster; red line, extreme mammals), but not for other apparent clusters (for example, terrestrial
cluster(s) from the model. Grey vertical lines show the range of observed Indo-Pacific herps). The BHM integrates the magnitude of within-population
values. In comparing the model output to the data we show the following. (1) fluctuations, time-series sizes, number of populations, among-population
The variation of the BHM primary cluster (blue line) is much lower than the raw variance, and the magnitude and frequency of the extreme populations in
data, because the BHM separates variation in among-population trends from determining whether additional (extreme) clusters are needed to account for
variation due to within-population fluctuations. (2) The BHM model identifies the observations.
evidence for extreme clusters in both directions (for example, terrestrial
Extended Data Fig. 6 | Histograms of observed growth rates and output of the BHM model for systems 17–32. Blue line, primary cluster; red line, extreme
cluster(s) from the model. Grey vertical lines show the range of observed values. For further information, see Extended Data Fig. 5.
Article
Extended Data Fig. 7 | Histograms of observed growth rates and output of the BHM model for systems 33–48. Blue line, primary cluster; red line, extreme
cluster(s) from the model. Grey vertical lines show the range of observed values. For further information, see Extended Data Fig. 5.
Extended Data Fig. 8 | Histograms of observed growth rates and output of the BHM model for systems 49–57. Blue line, primary cluster; red line, extreme
cluster(s) from the model. Grey vertical lines show the range of observed values. For further information, see Extended Data Fig. 5.
nature research | reporting summary
Corresponding author(s): Brian Leung
Last updated by author(s): Aug 27, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Bayesian analyses were conducted using the STAN 2.14 language, and processed and analyzed in R 3.6.3. The lme4 1.1-23 package was
referenced in the text. Custom code from this article can be obtained at: https://doi.org/10.5281/zenodo.3901586
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
October 2018
Data can be obtained from the Living Planet Index database. <www.livingplanetindex.org/>. (2016), AmphiBIO database from <https://figshare.com/articles/
Oliveira_et_al_AmphiBIO_v1/4644424>, Fishbase database <www.fishbase.org>, and mammal, bird and reptile life history traits from <https://doi.org/10.6084/
m9.figshare.c.3308127.v1>
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Research sample The data was obtained from the Living Planet Index database. <www.livingplanetindex.org/>. (2016), and consisted of 15241
vertebrate populations. To avoid double counting, when a species contained both finer resolution estimates within a country (2593
entries) as well as a country-wide aggregate, we excluded the country-wide aggregate (537 entries). This resulted in 14700
populations remaining in our analysis. Each system was defined by a combination of habitat domain (terrestrial, freshwater and
marine), biogeographic realm, and taxonomic grouping (Fish=Actinopterygii, Elasmobranchii, Holocephali, Myxini, Chondrichthyes,
Sarcopterygii, Cephalaspidomorphi; Birds=Aves, Mammals=Mammalia, Herps = Amphibia, Reptilia). Terrestrial and freshwater habitat
domains were separated into five realms (Afrotropical, Nearctic, Neotropical, Palearctic, and Indo-Pacific), whereas the marine
domain was separated into six realms (Arctic, Atlantic north temperate, Atlantic tropical/sub-tropical, Pacific north temperate, Indo-
Pacific tropical/sub-tropical, and South-temperate/Antarctic).
Sampling strategy All population time-series data in the LPI dataset were used. To avoid double counting, when a species contained both finer
resolution estimates within a country (2593 entries) as well as a country-wide aggregate, we excluded the country-wide aggregate
(537 entries). This resulted in 14700 populations remaining in our analysis.
Data collection The data was obtained by Dan Greenberg, and downloaded from publicly available databases identified in the data availability
statement
Timing and spatial scale Data were analyzed from 1970-2014, as these coincided with the analyses from the Living Planet Index. The spatial scale for the
analysis was global. The data was comprised of 14700 populations across many studies, and thus was measured at many scales. Thus,
relative changes per population was used.
Data exclusions To avoid double counting, when a species contained both finer resolution estimates within a country (2593 entries) as well as a
country-wide aggregate, we excluded the country-wide aggregate (537 entries). This resulted in 14700 populations remaining in our
analysis.
Reproducibility This is not relevant, as the existing LPI database was used. The purpose of the study was not an experiment, but instead to re-analyze
the available information on vertebrate trends, to evaluate whether previous estimates of decline (>50%) were due to clusters of
extremely declining populations, and to separate and analyze extreme clusters and primary clusters separately.
Randomization This is not relevant, as the existing LPI database was used, chosen for its impressive size and geographic coverage, and because
previous analyses of these data suggested broad-scale average vertebrate.
Blinding This is not relevant. Blinding as done in clinical trials, where group assignments of individuals is hidden from some researchers.
Primary data collection and experiments were not conducted in this study.
2
Materials & experimental systems Methods
October 2018
3
Article
Our understanding of the evolution of Mesozoic birds continues to falls within a critical spatiotemporal gap. Very few avialans are known
improve, driven predominantly by discoveries from the Early Creta- from the entire Cretaceous period of Afro-Madagascar. The specimen
ceous epoch of China1–3,6. Although these specimens show considerable expands our knowledge of realized cranial shape disparity, in terms of
variation in body size, soft-tissue anatomy and inferred ecologies2–4,10,11, both morphological details and the proportions of elements, within
the disparity in Mesozoic avialan cranial shape remains restricted to the enantiornithine radiation and Mesozoic birds as a whole.
a relatively limited number of forms that are considered to be either
generalists or substrate-probing specialists5,6,12–15 and represent groups
that are only distantly related to crown birds. The Late Cretaceous Systematic palaeontology
(about 100–66 million years ago) chapter of avialan evolution remains Theropoda Marsh, 1881
relatively incomplete owing to a paucity of new fossil discoveries Paraves Sereno, 1997
(although see recent studies on birds such as Ichthyornis16 and Aste- Avialae Gauthier, 1986
riornis17). Thus, new fossils of Late Cretaceous birds are essential for Ornithothoraces Chiappe, 1995
refining hypotheses that relate to the morphological evolution and Enantiornithes Walker, 1981
diversification of avialans. Falcatakely forsterae gen. et sp. nov.
The phylogenetic diversity of early branching (non-neornithine)
Mesozoic birds is dominated by enantiornithines, which have been Etymology. ‘Falcata’ (from Latin falcatus), meaning armed with a
heralded as the first diversification of avialans and are characterized scythe, in reference to the shape of the rostrum; ‘kely’ (Malagasy),
by a range of body sizes and inferred habits2,14,18–21. This radiation is meaning small; ‘forsterae’, in recognition of Catherine A. Forster’s
notable for its apparent near-global distribution throughout most of contributions to work on Madagascan paravians.
the Cretaceous period. An exceptionally well-preserved partial cranium Holotype. Partial cranium (University of Antananarivo, UA 10015),
of a previously unknown enantiornithine (University of Antananarivo which consists of the rostrum, palate and periorbital regions (Fig. 1,
[UA] 10015) from the latest Cretaceous (Maastrichtian) of Madagascar Extended Data Figs. 1, 2 and Supplementary Videos 1–8).
1
Department of Biomedical Sciences, Heritage College of Osteopathic Medicine, Ohio University, Athens, OH, USA. 2Ohio Center for Ecological and Evolutionary Studies, Ohio University,
Athens, OH, USA. 3Department of Earth Sciences, Denver Museum of Nature & Science, Denver, CO, USA. 4Department of Anatomical Sciences, Stony Brook University, Stony Brook, NY, USA.
5
Centre for Integrative Anatomy, Department of Cell and Developmental Biology, University College London, London, UK. 6Geology Department, Macalester College, St Paul, MN, USA.
7
Département de Sciences de la Terre et de l’Environnement, Université d’Antananarivo, Antananarivo, Madagascar. ✉e-mail: oconnorp@ohio.edu
Pisciv
hus
Longusunguis
is
orena
Shenqiornis
ohaiorn
hync
Fortu
rnis
*
Zhouorn
s
Bohaiorni
Sulcavis
ngo
ntiorn
Pte
aeor
Liny
ngua
ura
Du
Parab
cha
yx
ryg
nh
Ne
izoo
iorn
Arch
Go norn is
ter
vis
is
is
vis
uan yx
orn
Jian
uq
is
bip
Lo agop
So ngs usa
ron
Sch
is
rn
ue
is
gia
no
ter
r
Ve
Vo
is
Ho gic
Eo
ha
t
rn
Pa
s
Pe
co rnis
go
pe
is
ng
rn
ng
lin
Ra is
o
or
ng
Lo pa ni rn is
ng xa s a no orn
iro Y n is
str vis x i a rav
Lo
ng av Yi civo
ipt is Ornithuromorpha s
ery Pi is
Bo x rav
luo
ch Ite sus
n
Qili ia Ga avis
Falc ana sar
atak Ap is
ely Enantiornithes h yorn
Sape Icht is
ornis o r n
Confu
ciusorn Enali venus
is dui rnis ad
Bapto
Jinzhouo varneri
rnis Baptornis
Confuciusornis san
ctus Hesperornis
Changchenornis Parahesperornis
Vegavis
Eoconfuciusornis
is Anas
Jeholorn
ryx Gallus
e o p te
Archa
Fig. 2 | Mosaic evolution of the avialan facial skeleton as depicted among lavender, lacrimal; blue, dentary. Illustrations of Archaeopteryx, Ichthyornis,
select early branching forms. Phylogenetic analysis places Facatakely among Hesperornis and Gallus were modified from a previous publication16.
enantiornithine birds. The illustration of Xinghaiornis(*) is placed near its See Supplementary Information for additional details for included taxa and
approximate position in the phylogeny based on a previous publication15. phylogenetic analyses.
Illustrations are not to scale. Red, premaxilla; green, maxilla; yellow, nasal;
avialans. The palatine is triradiate, with a long, thin rostral process Quantitative assessment of non-avialan and avialan (including Neor-
that abuts the maxilla (Extended Data Fig. 2a). The palatine does not nithes) facial shape demonstrates the combination of a derived cranial
contact the jugal and only modestly contacts the pterygoid, but shares phenotype in Falcatakely (that is, a neornithine-like expanded rostrum)
an elongate contact with the ectopterygoid. A dorsomedially directed formed by an underlying plesiomorphic paravian skeletal framework.
choanal process sweeps towards the midline to join its antimere. Only We used two-dimensional geometric morphometrics (Fig. 3) to com-
the thin rostral processes of the pterygoids are preserved in UA 10015; pare the maxillary and premaxillary shape in UA 10015 to that of a sam-
these processes are in close association with the palatines (Extended ple of fossil non-avialan theropods, as well as the crown birds Gallus
Data Fig. 2a). The ectopterygoid, an element that is unknown in most gallus (red junglefowl) and Nothoprocta pentlandii (Andean tinamou).
Cretaceous avialans24,31, is represented by a robust body and a thin, Principal component analysis reveals that species group together on the
elongate, uncinate process that contacts the jugal bar (Extended basis of the ratio of the maxillary to premaxillary size (the first principal
Data Fig. 2a). The vomers are represented by two thin, dorsoventrally component) and the ratio of the rostrocaudal length to dorsoventral
restricted laminar plates that extend rostrally between the two maxillae height of both elements (second principal component). Despite hav-
(Extended Data Fig. 2a, c). Thin sheets of bone are present just rostral ing maxillary and premaxillary proportions that are similar to those of
to the pterygoids, potentially representing the expanded caudal end non-avialan theropods (for example, paravians, oviraptorosaurs and
of the vomer, reminiscent of the condition in Gobipteryx24,31. ornithomimosaurs), Falcatakely exhibits an overall rostrum phenotype
that is convergent on a number of neornithine groups.
The configuration of the individual skeletal elements in Falcatakely
Mosaic evolution in the avian beak is more similar to the non-avialans Microraptor and Zanabazar than to
Our phylogenetic analyses recover Falcatakely nested within Enantio- ornithuromorphs (including neornithines) owing to the expanded max-
rnithes (Fig. 2 and Extended Data Figs. 3, 4). The long, deep and narrow illa and relatively small premaxilla. Nonetheless, the three-dimensional
rostrum of Falcatakely, dominated by an expanded maxilla, provides a shape of the pre-orbital facial skeleton closely resembles that of
stark contrast to the facial region formed by the premaxilla and maxilla some extant birds (Extended Data Figs. 5, 6), as assessed using
in other enantiornithines and more-crownward non-neornithines. Even three-dimensional geometric morphometrics to compare the shape
among rostrally elongated ornithothoracine taxa such as Longipteryx, of the maxilla, premaxilla and nasal within a sample of 349 extant
Longirostravis and Dingavis, this morphology is achieved through a birds32 (Supplementary Information). Principal component analy-
concomitant reduction in premaxillary and maxillary height as bones sis of the rostrum shape reveals that Falcatakely occupies a position
elongate along the rostrocaudal axis5,6,12,14,15. in whole-rostrum morphospace that is quantitatively similar to those
Extended Data Fig. 2 | Palatal and lateral facial regions of the Cretaceous of the caudal margin (that is, the ventral ramus of the lacrimal) of the antorbital
enantiornithine bird Falcatakely (UA 10015, holotype). a, Digital polygon fenestra. Scale bar, 5 mm; the scale bar is representative for a and c; the
surface reconstruction (from microcomputed tomography scans) of the palate reconstruction in b is not to the same scale. AOF, antorbital fenestra; bs,
and lateral face in ventral view. b, Reconstructed outline drawing of Falcatakely basisphenoid rostrum; cp, choanal process of the (right) palatine; ect,
in palatal view (shaded regions are not preserved). c, Digital polygon surface ectopterygoid; EN, external nares; jpmx, jugal process of the maxilla; mpmx,
reconstruction of internal aspect of left facial skeleton (premaxilla, maxilla and midline premaxilla; mx, maxilla; na, nasal; pal, palatine; pmx, premaxilla; pter,
nasal) and palate in right lateral view. The left and right sides are indicated as (l) pterygoid; to, tooth; up, uncinate process of the ectopterygoid; vm, vomers.
and (r), respectively. The dashed line in c represents the approximate contour
Extended Data Fig. 3 | Majority- rule tree of Falcatakely among coelurosaurians from the Bayesian analysis of the TWiG matrix. Clades outside of the Avialae
are collapsed for brevity. Posterior probabilities are placed above the nodes.
Article
Extended Data Fig. 4 | Majority -rule tree of Falcatakely among avialans from the Bayesian analysis of a modified matrix that was previously published.
A matrix modified from a previous study25 was used. Posterior probabilities are placed above the nodes.
Extended Data Fig. 5 | Geometric morphometric analysis of rostrum shape (Fig. 3), the overall three-dimensional rostrum phenotype occupies the
in Falcatakely among avians. Plot of the first two principal components of the morphospace that is converged on by subsequent radiations of neornithine
three-dimensional landmark analysis of total rostrum shape of Falcatakely and birds (Supplementary Data). See Supplementary Information for analytical
extant avian taxa. Whereas the unique configuration of the maxilla and protocols.
premaxilla in Falcatakely is more similar to those of non-avialan paravians
Article
Extended Data Fig. 6 | Landmarking procedure for three-dimensional geometric morphometric analysis in dorsal and lateral views. a, Dorsal view.
b, Lateral view. Red spheres represent anatomical (type I) landmarks; yellow spheres are sliding semi-landmarks.
nature research | reporting summary
Corresponding author(s): Patrick M. O'Connor
Last updated by author(s): Aug 17, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis MrBayes v3.2; TNT 1.5; Tracer v1.6; Geomorph (R package), Adams and Otárola-Castillo, 2013; StereoMorph (R package); Avizo 7 (VSG), 9
(FEI/Thermo-Fisher Scientific), and Avizo Lite 2019 (ThermoScientific); Animation Producer in Avizo; Adobe Acrobat Pro DC (Continuous
Release) Version 2020, Adobe Premiere Pro (Creative Cloud edition).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
April 2020
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
UA 10015 is cataloged into the collections at the Université d’Antananarivo. Details regarding digital file development and derivatives of files (e.g., DICOM, PLY) used
as part of the study are included in the Supplementary Information and archived on the MorphoSource website (https://www.morphosource.org/Detail/
ProjectDetail/Show/project_id/7894). Phylogenetic character information and parameters used in the analyses are provided in the Supplementary Information.
Executable files for phylogenetic analyses, interactive 3D morphospace plot, and interactive 3D PDFs are hosted on DRYAD: https://doi.org/10.5061/
dryad.mkkwh70wg. This published work, including the novel genus (urn:lsid:zoobank.org:act:5BA26059-B428-4896-BFEA-2475419C61FC) and species
1
(urn:lsid:zoobank.org:act:69314771-F0D8-4C15-946C-524164385FB7) along with the associated nomenclatural acts, have been registered in ZooBank:
urn:lsid:zoobank.org:pub:4595D69E-FE12-4DAD-B155-89F084254F73.
Research sample This single cranium of Falcatakely (UA 10015) is the only known material of this taxon thus far discovered; it is known exclusively from
the Upper Cretaceous (Maastrichtian) of northwestern Madagascar.
Sampling strategy The study developed herein involves the description of a new taxon based on direct observation, light microscopy, and micro-
computed tomography of fossil represented by this single cranium. Digital preparation allowed for the a complete analysis and
reconstruction of individual elements of the cranium.
Data collection The holotype of Falcatakely forsterae (UA 10015) was collected from locality MAD05-42 by hand quarrying (ice pick, brush, rock
hammer), with subsequent emplacement in a plaster jacket prior to removal for laboratory processing. Mechanical and digital
preparation of the fossil was completed by J.R. Groenke, with interpretation of the anatomy (both of the fossil itself and digital
reconstructions/interpretations) by P.M. O'Connor, A.H. Turner, and J.R. Groenke. R.N. Felice led the morphometric analyses
included herein. Character scorings assessed by A.H. Turner and P.M. O'Connor, with phylogenetic analyses completed by A.H.
Turner.
Timing and spatial scale The specimen was originally collected during the 2010 calendar year, but only initially prepped and CT scanned (medical CT scanner
of the plaster jacket) that same year, yielding an ambiguous identification. J.R. Groenke (Ohio University) did additional mechanical
preparation in March 2017, immediately followed by a high-resolution microCT scan in April 2017. Intensive digital preparation then
ensued between April 2017 and January 2018, with subsequent, albeit intermittent, digital preparation, interpretation and
refinement of models through January 2019.
Reproducibility Not applicable; given that this paper focuses on a single specimen thus far known to humankind, it does not fall into the category for
being reproducible. However, the datasets assembled for this study are publicly available for future reanalyses by other workers.
Location The holotypic specimen was collected from the Upper Cretaceous Maevarano Formation, Mahajanga Basin, Madagascar.
Approximate coordinates: S 15 degrees, 54' 20.94", E 46 degrees, 35' 00.23"
Access & import/export The specimen was collected under a Collaborative Agreement with the University of Antananarivo and various ministries (Ministry of
Mines, Ministry of Higher Education) of the Madagascar government. Permits from the Ministry of Mines (Scientific Studies
Authorization No 005/2010) and the Ministry of Higher Education/University of Antananarivo (No 76 PAB/10, Supporting
documentation: - Scientific Authorization Studies No 007/2010, 005/2010, 006/2010, 009/2010) were used in support of field
research were issued on 17 June 2010 and 18 June 2010, respectively.
April 2020
Disturbance This study involved minimal disturbance to the environment, as the fossil-bearing layer was within 0.70 meters of the surface in the
locality.
2
Materials & experimental systems Methods
Specimen deposition The holotype specimen of Falcatakely forsterae is reposited in the University of Antananarivo (UA), Madagascar with the collection
number UA 10015 .
Dating methods No new dates were obtained for this contribution; age constraint for the Maevarano Fm. is developed in Rogers et al. 2000.
Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.
Ethics oversight Fossil collection and exportation were completed in compliance with permits issued by the Ministry of Mines (United Republic of
Madagascar) and through a Collaborative Agreement with the University of Antananarivo and various ministries (Ministry of Mines,
Ministry of Higher Education) of the Madagascar government.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
April 2020
3
Article
Wheat is a staple food across all parts of the world and is one of the bread wheat (Triticum aestivum), used for making bread and noodles.
most widely grown and consumed crops7. As the human population A, B and D in these designations correspond to separate subgenomes
continues to grow, wheat production must increase by more than 50% derived from three ancestral diploid species with similar but distinct
over current levels by 2050 to meet demand7. Efforts to increase wheat genome structure and gene content that diverged between 2.5 and
production may be aided by comprehensive genomic resources from 6 million years ago10. The large genome size (16 Gb for bread wheat),
global breeding programs to identify within-species allelic diversity and high sequence similarity between subgenomes and abundance of
determine the best allele combinations to produce superior cultivars2,8. repetitive elements (about 85% of the genome) hampered early wheat
Two species dominate current global wheat production: allotetra- genome-assembly efforts3. However, chromosome-level assemblies
ploid (AABB) durum wheat (Triticum turgidum ssp. durum), which have recently become available for both tetraploid11,12 and hexaploid
is used to make couscous and pasta9, and allohexaploid (AABBDD) wheat1,13. Although these genome assemblies are valuable resources,
ArinaLrFor Jagger
Chinese Julius
Spring Paragon
ArinaLrFor
Cadenza
SY Mattis
Weebill 1 LongReach Lancer
PI190962 (spelt wheat)
CDC Stanley
CDC Stanley
CDC Landmark
CDC Landmark
PI190962
(spelt wheat) Mace
c d LTR-retrotransposon density
Sequence
Low High
identity
8,000 2 Unique
100% 0
2 2
0
2 3
0
2
Unique NLRs
Number of lines
2
0 5
5,000 2
95% 6
0
2
0 7
2
0 8
2
0 9
2
0 10
2,000 2
0 11
0 5 10 15 0 25 50 75 100
Number of lines Chromosomal location (% length)
Fig. 1 | Patterns of variation in the wheat genome. a, Principal component of genomes increases. Dashed vertical lines represent 90% of the NLR
analysis of polymorphisms from exome-capture sequencing of about 1,200 complement. Markers indicate the mean values of all permutations of the order
lines (grey markers), 16 lines from whole-genome shotgun resequencing of adding genomes. Whiskers show maximum and minimum values based on
(orange markers) and our new assemblies (black markers). Text colours reflect one million random permutations. d, Chromosomal location versus insertion
different geographical locations and winter or spring growth. b, Dendrogram age distribution of unique to (reading downward) increasingly shared syntenic
of pairwise Jaccard similarities for gene PAV between all RQA assemblies. full-length LTR retrotransposons.
c, Number of unique NLRs at different per cent identity cut-offs as the number
RLC-Angela fl-LTRs were the most abundant (21,000–27,000 full-length patterns were uniquely associated with a single genome (Supplemen-
copies per genome) and analysis of variant patterns identified several tary Tables 13–16). The majority of unique regions were in PI190962
chromosomal segments that contained numerous unique or rare ret- (spelt wheat; Triticum aestivum ssp. spelta), which was expected, given
rotransposon insertions (Extended Data Fig. 5), which, on the basis that it diverged from modern bread wheat several thousand years ago.
of breeding history, we hypothesize to represent introgressions. For A similar strategy was used to confirm RLC-Angela variation at the
example, the LongReach Lancer RQA revealed two unique regions, a telomeric region of chromosome 2A in Jagger, Mace, SY Mattis and
pericentric region on chromosome 2B and a segment on the end of CDC Stanley (Fig. 2c), which corresponds to the 2NvS introgression
chromosome 3D (Fig. 2a, b), both of which affect chromosome length from Aegilops ventricosa (Supplementary Note 5). This introgression
(Extended Data Fig. 5). We used pedigree analysis to postulate the is a well-known source of resistance to wheat blast30, and contains the
source of the introgressions and performed whole-genome sequenc- Lr37–Yr17–Sr38 gene cluster, which provides resistance to several rust
ing of multiple accessions of putative donors. LongReach Lancer carries diseases31. Sequencing of A. ventricosa accessions (Supplementary
the stem rust resistance gene Sr36, derived from an introgression from Table 12) followed by comparison of chromosomes with the RQAs con-
Triticum timopheevii, and the resistance genes Lr24 (leaf rust) and Sr24 firmed that Jagger, Mace, SY Mattis and CDC Stanley carry the 2NvS
(stem rust), derived from tall wheatgrass28,29 (Thinopyrum ponticum). introgression, which spans about 33 Mb on chromosome 2A (Fig. 2c,
We generated whole-genome sequence reads from multiple T. ponticum Extended Data Fig. 6a). We annotated the coding genes within this
and T. timopheevii accessions (Supplementary Table 12) and alignment region and identified 535 high-confidence genes; more than 10% were
to the LongReach Lancer RQA confirmed a T. ponticum introgression predicted to be associated with disease resistance, including genes that
spanning a region of approximately 60 Mb of chromosome 3D (Fig. 2a), encode putative NB-ARC and NLRs (Extended Data Fig. 6b, Supplemen-
whereas T. timopheevii aligned to the majority (427 Mb) of chromo- tary Tables 17, 18). Furthermore, we used genotyping by sequencing to
some 2B (Fig. 2b). Overall, we identified 341 chromosomal segments detect the 2NvS segment in three wheat panels and discovered that its
larger than 20 Mb with unique or rare fl-LTR insertion patterns that frequency has been increasing in breeding germplasm and its pres-
were present in only 1 to 4 of the RQA genomes, of which 273 insertion ence is consistently associated with higher grain yield (Extended Data
d 500 g
SY Mattis Hi-C Norin 61 Hi-C
Julius chromosome 4D
Chromosome 7B (Mb)
300
350
Centromere
200 shift
300
100
250
0
200
0 100 200 300 400 500 100 120 140 160 180 200 100 120 140 160 180 200
Chinese Spring chromosome 4D Chromosome 5B (Mb)
position (Mb)
Fig. 2 | Introgressions and large-scale structural variation in wheat. wheat wild relatives (blue–yellow heat map; legend at bottom). d, Dot plot
a–c, T. ponticum introgression on chromosome 3D in LongReach Lancer (a), alignment showing chromosome-level collinearity (black) with relative density
T. timopheevi introgression on chromosome 2B in LongReach Lancer (b) and of CENH3 ChIP–seq mapped to 100-kb bins for Chinese Spring (blue) and Julius
A. ventricosa introgression on chromosome 3D in Jagger (c). Track i, map of (red); the arrow indicates a centromere shift. e, Robertsonian translocation
polymorphic RLC-Angela retrotransposon insertions (legend at bottom); track between chromosomes 5B and 7B in ArinaLrFor. f, g, Cytology (f) and Hi-C (g)
ii, density of projected gene annotations from Chinese Spring (blue bars, confirm the 5B/7B translocation in SY Mattis (left) compared with the
scaled to maximum value); track iii, per cent identity to Chinese Spring based non-carrier Norin 61 (right). In f, five independent cells were observed; the
on chromosome alignment (yellow; scale is 0–100%); track iv, read depth of translocation was confirmed independently ten times. Scale bar, 10 μm.
Fig. 6c, d, Supplementary Tables 19, 20). Of note, we identified about CENH3 chromatin immunoprecipitation and sequencing (ChIP–seq)35
60 genes belonging to the cytochrome P450 superfamily, which have to determine the positions and sizes (about 7.5–9.6 Mb) of the cen-
been implicated in abiotic and biotic stress tolerance32 and have been tromeres for each RQA (Supplementary Tables 21, 22), which were
functionally validated to influence grain yield in wheat33. Together, consistent with previous estimates for wheat1. Furthermore, all chro-
these data indicate that the modern wheat gene pool contains many mosomes showed a single active site, implying that previous reports
chromosomal segments of diverse ancestral origins, which can be iden- of multiple active centromeres in Chinese Spring1 were artefacts of
tified by their transposable-element signatures. We also confirmed the misoriented scaffolds. However, we found examples in which the rela-
wild-relative origins of three introgressions within the RQA assemblies— tive position of the centromere was shifted owing to several pericentric
a first step towards characterizing causal genes for breeding targets, inversions, including inversions on chromosomes 4B and 5B (Extended
such as resistance to wheat blast and rust fungi. Data Fig. 7a, b). We also observed one instance in which the centro-
meric position changed, but was not associated with a structural event.
Specifically, on chromosome 4D in Chinese Spring, the centromere is
Centromere dynamics shifted by around 25 Mb relative to the consensus position (Fig. 2d).
Centromeres are vital for cell division and chromosome pairing during This shift was previously recognized by cytology but was hypothesized
meiosis. In plants, functional centromeres are defined by the epige- to result from a pericentric inversion36. However, the high degree of
netic placement of the modified histone CENH334. We therefore used collinearity between genomes supports the hypothesis that Cen4D in
15.7 Mb
17.0 Mb
Paragon
Robigus Sm1
Mace
Adult
15.2 Mb
15.7 Mb
Chinese Spring
Weebill 1
Norin 61 CDC Landmark
Cadenza (Sm1 carrier)
Julius
ArinaLrFor
Larvae
5 Mb Sm1 25 Mb
Healthy
1 Sm1 carrier
CDC Landmark
G182R
Paragon
W 98*
Robigus
Mace 2 Sm1 non-carrier
Claire
CDC Stanley
Chinese Spring 3 Sm1 non-carrier
Weebill 1 (that is, Waskada)
Damaged
Norin 61
Cadenza
Julius NB-ARC LRR S/T kinase MSP
ArinaLrFor
LongReach Lancer Transmembrane Mutations Alternative haplotype
SY Mattis
Jagger
Fig. 3 | Cloning of the gene Sm1. a, The orange wheat blossom midge oviposits surrounding Sm1 (teal). c, Top, anchoring of the Sm1 fine map to the physical
eggs on wheat spikes and the larvae feed on developing wheat grains, resulting maps of Chinese Spring and CDC Landmark and graphical genotypes of three
in moderate to severe damage to mature kernels. b, Top, sections of haplotypes critical to localizing the Sm1 candidate gene. Bottom, annotation of
chromosome 2B of the same colour in the same position share haplotypes the Sm1 candidate gene, which encodes NB-ARC and LRR motifs in addition to
(based on 5-Mb bins), with the exception of those in grey, which indicates a the integrated serine/threonine (S/T) kinase and MSP domains. Two
line-specific haplotype. The position of Sm1 is indicated with respect to the independent ethyl-methanesulfonate-induced mutations (W98* and G182R)
CDC Landmark assembly. Bottom, zoomed-in view of haplotype blocks (based result in loss of function and susceptibility to the orange wheat blossom midge
on 250-kb bins) from 5 to 25 Mb positions on chromosome 2B, surrounding (light blue lines). An alternative haplotype was observed in the kinase region of
Sm1. CDC Landmark, Robigus and Paragon all carry the same haplotype Waskada (black).
Chinese Spring has shifted to a non-homologous position; this shifting represent most of the UK wheat gene pool grown since the 1920s41.
of centromeres to non-homologous sites has also been reported in The translocation occurred in 66% of the lines and was selectively neu-
maize37. By characterizing the centromere positions for these diverse tral (Supplementary Note 7). Notably, the Ph1 locus on chromosome
wheat lines, we provide strong evidence for changes in centromere 5B, which controls the pairing of homeologous chromosomes during
position caused by structural rearrangements and centromere shifts. meiosis42, is near the translocation breakpoint, but remained highly
syntenic between translocation carriers and non-carriers. Genetic
mapping and analysis of short-read sequencing data indicated that
Large-scale structural variation between genomes the 5B/7B translocated chromosomes recombine freely with 5B and 7B
Structural variants are common in wheat38, and impact genome struc- chromosomes (Extended Data Fig. 9d), suggesting that chromosome
ture and gene content. We characterized large structural variants pairing is not affected by the translocation.
using pairwise genome alignments (Extended Data Fig. 1), changes in
three-dimensional topology of chromosomes revealed by Hi-C confor-
mation capture directionality biases along the genome39,40 (Extended Haplotype-based gene mapping
Data Fig. 8, Supplementary Table 23), which were confirmed by Oxford To develop improved wheat cultivars, breeders shuffle allelic vari-
Nanopore long-read sequencing (Extended Data Fig. 2) and cytological ants by making targeted crosses and exploiting the recombination
karyotyping (Extended Data Fig. 7c, Supplementary Table 24, Sup- that occurs during meiosis. These alleles, however, are not inherited
plementary Note 6). The most prominent event was a translocation independently, but rather as haplotype blocks that often extend
between chromosomes 5B and 7B, observed in ArinaLrFor, SY Mattis across multiple genes that are in genetic linkage43,44. We quantified
(Fig. 2e–g) and Claire. Normally, chromosomes 5B and 7B are approxi- haplotype variation along chromosomes across the assemblies, and
mately 737 and 762 Mb long, respectively, and we estimated that the developed visualization software to support its utility (Supplemen-
recombined chromosomes are 488 Mb (5BS/7BS) and 993 Mb (7BL/5BL) tary Note 8). We used these haplotypes to characterize a locus that
long, making 7BL/5BL the largest wheat chromosome (Extended Data provides resistance to the orange wheat blossom midge (OWBM, Sito-
Fig. 9a). In ArinaLrFor and SY Mattis, the 7BL/5BL breakpoint resides diplosis mosellana Géhin), one of the most damaging insect pests of
within an approximately 5-kb GAA microsatellite, which we were wheat, which is endemic in Europe, North America, west Asia and the
able to span using polymerase chain reaction (PCR) (Extended Data Far East. Upon hatching, the first-instar larvae feed on the developing
Fig. 9b, c). By contrast, the breakpoint on 5BS/7BS was less syntenic, grains and damage the kernels (Fig. 3a). Sm1 is the only gene in wheat
and we detected polymorphic fluorescence in situ hybridization signals known to provide resistance to OWBM6. CDC Landmark, Robigus and
between ArinaLrFor and SY Mattis on the 5BS portion of the translo- Paragon are all resistant to the OWBM, and all three carry the same
cated chromosome segment, suggesting that the regions adjacent to 7.3-Mb haplotype within the Sm1 locus on chromosome 2B (Fig. 3b).
the translocation events differ on 5BS/7BS (Supplementary Note 6). To identify Sm1 gene candidates, we used high-resolution genetic
To determine the stability of the translocation in breeding, we geno- mapping and refined the locus to a 587-kb interval in the CDC Land-
typed for the translocation event in a panel of 538 wheat lines that mark RQA (Fig. 3c, Extended Data Fig. 10a, Supplementary Table 25).
Extended Data Fig. 1 | Chromosome-scale collinearity between the RQA. Lancer (red rectangles) and 5B/7B translocation in SY Mattis and ArinaLrFor
Genomes were aligned chromosome by chromosome using MUMmer and are (purple rectangles) are indicated.
represented as dot plots. The introgression on chromosome 2B of LongReach
Extended Data Fig. 2 | Evaluation of the CDC Landmark RQA using Oxford chromosomes 2A, 3A, and 3D. The directionality biases estimated from
Nanopore Long Reads. a, Scaffold-scaffold long read contact map showing alignments of Hi-C data against Chinese Spring (left, top), and chromosome
shared read IDs between scaffold ends along the ordered scaffolds in the CDC alignment of the inversion events between CDC Landmark and Chinese Spring
Landmark pseudomolecules. The diagonal pattern indicates that adjacent RQAs (left, bottom) are shown. Long reads spanning the inversion events and
scaffolds share the same long reads and are therefore properly ordered and magnified views of the reads aligning to the left and right boundaries of the
oriented by Hi-C in the RQA. b, Characterization of inversion events on inversions (right) are provided.
Article
Extended Data Fig. 3 | Diversity of genes and TEs. a, Average pairwise genetic whiskers, 1.5 × interquartile range. c, Total gene counts and orthologues for the
diversity of the homeologues (coding sequences only) of the A, B and D RQA. Genes in orthologous groups with exactly one gene for each line
subgenomes. The mode of the A, B and D subgenome is 0.00057, 0.00082, and (Complete; dark brown), genes contained in unambiguous orthologous groups
0.0002, respectively. b, Tajima’s D estimates of coding sequences for each missing an orthologue for at least one line, that is, PAV (2-10 Lines; light brown),
wheat subgenome. The lower and upper range of the boxplot hinges and genes with ambiguous orthologues or CNV (Other; pink) are indicated. d,
correspond to the first and third quartiles (the 25th and 75th percentiles). Per cent of pairwise shared syntenic fl-LTRs between wheat lines.
Boxplots show centre line, median; box limits, upper and lower quartiles;
Extended Data Fig. 4 | Evolutionary relationships among PPR and mTERF chromosome 1B. RFL genes are shown as light pink triangles above the
gene sequences. a, The RFL clade is in blue and all remaining P-class PPRs are in chromosome scale. Conserved non-PPR genes used as syntenic anchors are
green. b, Clustered mTERF sequences are in blue and the remaining mTERFs are shown on the chromosome scale as coloured triangles. The total number (T)
shown in green. The scale bar represents number of substitutions per site. c, and the number of putatively functional RFL genes with 10 or more PPR motifs
Sequence inversions and copy number variation at the Rf3 locus on (F) are indicated on the right side of each panel.
Article
Extended Data Fig. 5 | Identification of alien introgressions from wheat cultivars a foreign segment is found in. Regions of particular interest are
relatives. A feature of foreign chromosomal introgressions is that they contain indicated by black rectangles. These include the 2NvS alien introgression from
unique patterns of TE insertions. Shown are stretches of >20 Mb containing A. ventricosa at the end of chromosome 2A in Jagger, Mace, SY Mattis and CDC
multiple polymorphic RLC-Angela retrotransposons that are found only in one Stanley, as well as introgression in the central region of chromosome 2B from
or a few (≤4) of the sequenced lines. One representative chromosome for each T. timopheevi in LongReach Lancer, and introgression at the end of
wheat subgenome is shown. Individual polymorphic retrotransposons are chromosome 3D from T. ponticum in LongReach Lancer.
indicated as coloured vertical lines. Colours correspond to the number of
Extended Data Fig. 6 | Detailed characterization of the 2NvS introgression of 2NvS introgression carriers in North American datasets from CIMMYT,
from A. ventricosa. a, Pairwise alignments of the first 50 Mb of chromosome Kansas State, and the USDA Winter Wheat Regional Performance Nursery
2A. The black arrow indicates a possible unique haplotype within spelt. b, (RPN) over time. d, Per cent yield difference in lines that carry the 2NvS
Orthologous genes between the 2NvS introgression from A. ventricosa in Jagger introgression. Two sided t-tests were performed to test for the significance of
and the genes on chromosomes 2A, 2B, and 2D in Chinese Spring. c, Frequency the impact of the 2NvS introgression. **P < 0.01; ***P < 0.001.
Article
Extended Data Fig. 7 | Centromere positions and karyotype variation. Chinese Spring (blue) and a representative genome of comparison (red) for
Functional centromere positions in the RQA have undergone structural and chromosome 4B of CDC Stanley (a), and chromosome 5B of Julius (b). c,
positional rearrangement. Chromosome alignments showing collinearity Detailed list and clustering of cytological features carried by each wheat line
(black scaffolds in same orientation, grey scaffolds in opposite orientation) (Supplementary Note 6). Features that are identical (dark grey) or have a gain
with relative density of CENH3 ChIP–seq mapped to 100 kb genomic bins for (black) or loss (light grey) relative to Chinese Spring are indicated.
Extended Data Fig. 8 | Hi-C validates inversions identified from pairwise Spring are shown. Boundaries of diagonal segments are indicative of inversions
chromosome alignments. Pairwise alignments of chromosome 6B from the and coincide with inversion boundaries identified from the chromosome
RQA and Chinese Spring are shown. Above each alignment dot plot, the alignments.
directionality biases estimated from alignments of Hi-C data against Chinese
Article
Extended Data Fig. 9 | Characterization of a translocation involving wheat nested PCR yielded a ~5 kb fragment that spanned the translocation breakpoint
chromosomes 5B and 7B. a, Cytogenetic karyotypes of Forno (left) and Arina and its identity was confirmed by sequencing. Both PCR and nested PCR were
(right), the parents of ArinaLrFor. Note that the large recombinant performed in duplicate; both replicates of the nested PCR were sequenced
chromosome 7B is represented by a distinct peak. b, Sequence of the using the Sanger method. For gel source data, see Supplementary Fig. 1. d,
translocation breakpoint on chromosome 7B of ArinaLrFor. Note that the exact Mapping of Illumina reads from the cultivars Arina and Forno on to the
breakpoint lies in a sequence gap (stretch of Ns). The bp positions are indicated pseudomolecules of ArinaLrFor. Sequence derived from Forno is shown in blue,
at the left. Forward PCR primers are shown in red and reverse primers in blue. while sequenced derived from Arina is in red. Note that chromosomes 5B and
The overlap of the two reverse primers is shown in purple. The outer primer 7B are derived from both parents, indicating that these parental chromosomes
pair was used for PCR, while the inner pair was used for a nested PCR. c, PCR can recombine freely, despite the presence of a large 5B/7B translocation in
amplification of the fragment spanning the translocation breakpoint. The Arina.
Extended Data Fig. 10 | See next page for caption.
Article
Extended Data Fig. 10 | Confirmation of gene expression and gene domain of the Sm1 gene candidate (top) and actin control (bottom) derived
structure for Sm1. a, Critical recombinants from the 99B60-EJ2G/Infinity and from RNA isolated from developing kernels (left) and wheat seedlings (right).
99B60-EJ2D/Thatcher populations used to fine map Sm1. The 99B60-EJ2G/ Unity and CDC Landmark are carriers of Sm1. Waskada carries an alternative
Infinity cross had 5,170 F2 plants, while 99B60-EJ2D/Thatcher cross had 5,264 haplotype and does not carry Sm1 (see main text). Thatcher was used as a
F2 plants; only recombinant haplotypes between orange wheat blossom midge susceptible parent for fine mapping of Sm1 and does not contain the associated
resistant (R) and susceptible (S) genotypes are shown. b, Oxford Nanopore NB-ARC domain. The experiment was replicated on four independent
long read confirmation of the Sm1 gene candidate in the CDC Landmark RQA biological samples for each condition. d, Distribution of an Sm1 allele-specific
(left), and alternative haplotype in Chinese Spring (right). Vertical coloured PCR marker in a diverse panel of >300 wheat lines.
lines indicate sequence variants. c, Amplification of cDNA for the NB-ARC
nature research | reporting summary
Corresponding author(s): Curtis Pozniak
Last updated by author(s): Aug 11, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis A multitude of software and databases were used in this study, all of which have been listed, cited, or provided. These include:
DeNovoMAGIC v3.0, W2RAP (no versions, https://github.com/bioinfologics/w2rap), LongRanger v2.1.6, GATK v3.8, R v3.6.1 and v3.0.2,
BLAT v3.5, BLAST v2.8 , MUSCLE v3.8, libsequence v1.8.3, EMBOSS v6.6.0, HMMER 3.1b2, PFAM v32.0, NLR-Annotator (no version,
https://github.com/steuernb/NLR-Annotator), Vmatch v2.3.0, TandemRepeatFinder v4.07b, LTRharvest genometools-1.5.9, HMMER
v3.0, MUMmer v3.23 (haplotype database) and v4 (all other analyses), HISAT v2.1.0, SNPrelate v3.11, BBTools/BBMap v38, ImageJ
v1.51n, minimap2 v2.13, FGENESH v2.6, NCBI Conserved Domain Search tool (no version, https://www.ncbi.nlm.nih.gov/Structure/cdd/
wrpsb), PROSITE release 2020_01, TMpred v25, STAR v2.6.0b., AUGUSTUS v3.2.3., GMAP v2017-06-20, EvidenceModeler v1.1.1, AHRD
v1.6, MCScanX v2.0, samtools v1.10, BEDtools v2.29, and custom data scripts (https://github.com/Uauy-Lab/pangenome-haplotypes;
http://people.beocat.ksu.edu/~jpoland/centromeres/).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
October 2018
All sequence reads have been deposited into the National Center for Biotechnology Information sequence read archive (SRA) (see Supplementary Table 1 for
accession numbers). Sequence reads for the RQAs, Th. ponticum, Ae. ventricosa and T. timopheevii have been deposited into the SRA (no. PRJNA544491) and ChIP-
1
seq short read-data used for centromere characterization is deposited as PRJNA625537. All Hi-C data has been deposited in the European Nucleotide Archive
(Supplementary Table 1). The RQAs and projected annotations are available for direct user download at https://wheat.ipk-gatersleben.de/. All RQA assemblies have
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions All sequencing data generated was used in the genome assembly and analyses. Whenever possible, all data was included in the supporting
analyses. Data exclusion applies only to some of the subsequent supporting analysis, which was pre-established based on limitations in the
data. For example, we excluded the scaffolded assemblies from some analyses because the analyses required chromosome
pseudomolecules. We performed diversity analysis both with the spelt genome but also excluding the spelt genome because it is a different
species and is much more diverged and biased the results.
Replication In all analyses that support the genome assemblies, the number of replicates or iterations are indicated in materials and methods or
supplemental tables. In each case, all replications were successful and were used. The genome assemblies themselves were validated using
multiple methods (i.e. BUSCO, genetic maps, HiC, 10x Genomics, cytology, and comparions to Chinese Spring). The CDC Landmark assembly
was further validated using Oxford Nanopore long read sequencing. This helped validate the other approaches.
Randomization Randomization does not directly apply to the genome sequencing and assembly; however it applies to some of the supporting analyses. In
these cases, the group design and data seeding for computational analysis are described in the materials and methods and adhere to widely
accepted standards. For example, analysis of NLRs (Fig. 1c), 1 million random permutations were used. For the field experiments established
for phenotyping of Sm1, all samples were replicated and randomized using appropriate experimental designs.
Blinding Blinding does not apply to this study, as the study focuses on genome sequencing. This study focuses on plants genomics and the results of
the study are not impacted by the concealment of treatment, data, or groups.
Antibodies
Antibodies used Chromatin immunoprecipitation (ChIP) was performed ausing wheat CenH3 antibody (Koo et al., 2015). A antigen with the
peptide sequence ‘RTKHPAVRKTKALPKK’ corresponding to the N-terminus of wheat CENH3 was used to produce antibody
utilizing the custom-antibody production facility provided by the Thermo Fisher Scientific, Illinois, USA (abs@thermofisher.com).
A 0.396 mg of the antibody pellet was dissolved in 2 ml of PBS buffer, pH 7.4 resulting in 198 ng/uL of the working concentration.
Validation In the manuscript, we validate the antibody according to a previous study of Chinese Spring (Koo et al., 2015) and achieved near
2
identical results (Supplementary Table 12). Additional controls were used in the study where the antibody was substituted with
rabbit serum, which serves as nonspecific binding control in chromatin immunoprecipitation assay.
Data access links The data for the project has been deposited at NCBI: PRJNA625537 and analysis files are available for download: http://
May remain private before publication. people.beocat.ksu.edu/~jpoland/centromeres/
Files in database submission BED files, delta files (MUMmer), data analysis scripts
Methodology
Replicates NA. Samples were obtained from 2-week-old seedlings.
Sequencing depth Paired-end reads were generated at varying levels of read depth, data was deposited at NCBI (PRJNA625537).
Antibodies Wheat CenH3 antibody - see: Koo DH, Sehgal SK, Friebe B, Gill BS (2015) Structure and stability of telocentric chromosomes
in wheat. PLoS One 10: e0137747.
Peak calling parameters Reads mapped per 100kb bin were counted for each sample using BEDtools and output as a bed file. Scripts for data
analysis are provided at http://people.beocat.ksu.edu/~jpoland/centromeres/. Unlike studies involving transcription factors,
CENH3 ChIP-seq provides clear distinct peaks that are ~100 fold greater than background.
Data quality SAM output files from HISAT2 were converted to BAM, sorted and filtered for minimum alignment quality of 30 using
SAMtools.
Software Reads for each sample were aligned to each of the respective genome assemblies using HISAT2.Reads mapped per 100kb
bin were counted for each sample using BEDtools and output as a bed file. Scripts for data analysis are provided at http://
people.beocat.ksu.edu/~jpoland/centromeres/.
October 2018
3
Article
A staple food of ancient civilizations, today barley is used mainly for in gene content and copy number in the control of agronomic traits.
animal feed and malting. Barley is more adaptable to harsh environ- The concept of the pan-genome refers to a species-wide catalogue
mental conditions than its close relative wheat, and maintains an impor- of genic presence/absence variation (PAV)12, or more generally,
tant role in human nutrition in harsh climatic regions that include structural variation that affects (potentially non-coding) sequences
the Ethiopian and Tibetan highlands2. As in other crops, genomics of 50 or more base pairs (bp) in size. Although several methods of
has been a major driver of progress in barley genetics and breeding pan-genomic analysis that use short-read sequence data in the con-
in the past decade3. The first draft reference genome for barley4, and text of a single reference genome have been devised13, large and com-
its subsequent revisions5,6, have formed the basis for gene isolation7, plex genomes require multiple high-quality sequence assemblies to
compiling a single-nucleotide polymorphism (SNP) variation atlas capture and contextualize sequences that are absent in—or highly
for wild and domesticated germplasm8, and activating plant genetic diverged from—a single reference genotype14. Progress in sequenc-
resources9. At the same time, reduced-representation surveys of struc- ing and genome mapping technologies has only recently made pos-
tural variation10 and map-based cloning11 have implicated variation sible the fast and cost-effective assembly of tens of genotypes of
1
Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, Seeland, Germany. 2Plant Genome and Systems Biology (PGSB), Helmholtz Center Munich, German Research
Center for Environmental Health, Neuherberg, Germany. 3Department of Plant Sciences, University of Saskatchewan, Saskatoon, Saskatchewan, Canada. 4Western Barley Genetics
Alliance, State Agricultural Biotechnology Centre, College of Science, Health, Engineering and Education, Murdoch University, Murdoch, Western Australia, Australia. 5Agriculture and
Food, Department of Primary Industries and Regional Development, South Perth, Western Australia, Australia. 6The James Hutton Institute, Dundee, UK. 7HudsonAlpha, Institute for
Biotechnology, Huntsville, AL, USA. 8Montana BioAg Inc, Missoula, MT, USA. 9Institute of Crop Sciences, Chinese Academy of Agricultural Sciences (ICS-CAAS), Beijing, China. 10College of
Agriculture and Biotechnology, Zhejiang University, Hangzhou, China. 11Bioproductivity Informatics Research Team, RIKEN Center for Sustainable Resource Science, Yokohama, Japan.
12
Kihara Institute for Biological Research, Yokohama City University, Yokohama, Japan. 13Institute of Plant Science and Resources, Okayama University, Kurashiki, Japan. 14School of
Agriculture, Food and Wine, University of Adelaide, Glen Osmond, South Australia, Australia. 15School of Life Sciences, University of Dundee, Dundee, UK. 16School of Life Sciences
Weihenstephan, Technical University of Munich, Freising, Germany. 17Hubei Collaborative Innovation Centre for Grain Industry, Yangtze University, Jingzhou, China. 18German Centre for
Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany. 19Center for Integrated Breeding Research (CiBreed), Georg-August-University Göttingen, Göttingen, Germany.
20
These authors contributed equally: Murukarthick Jayakodi, Sudharsan Padmarasu. ✉e-mail: C.Li@murdoch.edu.au; mascher@ipk-gatersleben.de; stein@ipk-gatersleben.de
PC4 (1.4%)
HOR 3081 400
0 HOR 3365
HOR 7552 300
HOR 8148
–0.01 HOR 9043 200
Hockett
Igri 100
–0.02 Morex
OUN333 0
–0.015 –0.005 0.005 RGT Planet
ZDM01467 0 200 400 600
PC3 (2%) ZDM02064 Morex position (Mb)
Fig. 1 | Chromosome-scale sequences of 20 representative barley principal components are shown in Extended Data Fig. 1a. b, Alignment of the
genotypes reveal large structural variants. a, We selected 20 barley pseudomolecules of chromosome 2H of the Morex and Barke cultivars. The
genotypes to represent the genetic diversity space, as revealed by PCA of inset zooms in on a 10-Mb inversion that is frequently found in germplasm from
genotyping-by-sequencing data of 19,778 domesticated varieties of barley 9. northern Europe. Co-linearity plots for all assemblies and chromosomes are
Principal component (PC)3 and PC4 are shown. The proportion of variance shown in Extended Data Fig. 3a.
explained by the principal components is indicated in the axis labels. Further
large-genome plant species, such as barley (haploid genome size of per assembly (Supplementary Table 1). However, we found pronounced
5 Gb)15. differences in the number of shared intact full-length LTR locations:
only 17 to 25% of full-length LTR locations present in the wild barley
B1K-04-12 were shared at 98% sequence identity and 98% alignment
Twenty barley reference genomes coverage with any domesticated genotype (Extended Data Fig. 4). By
The starting point for pan-genomics in barley was the comprehensive contrast, more closely related domesticated genotypes shared between
survey of species-wide diversity on the basis of the genome-wide geno- 53% and 67% of their full-length LTRs, consistent with previous reports
typing of more than 22,000 barley accessions, mainly from the German of rapid sequence turn-over in the non-coding space in large-genome
national gene bank9. To achieve a good representation of major barley plant species24,25.
gene pools, we selected accessions that were located in the branches De novo gene annotation using Illumina RNA sequencing and PacBio
of the first six principal components from the previously published Iso-Seq data (Supplementary Table 2) was performed for three geno-
principal component analysis (PCA)9 (Fig. 1a, Extended Data Fig. 1), types: Morex (which has previously been reported6), Barke and the Ethio-
reflecting the key determinants of population structure: geographical pian landrace HOR 10350 (Extended Data Fig. 5). Gene models defined
origin, row type and annual growth habit. In addition to these gene pool on the basis of these three assemblies were consolidated and projected
representatives, our panel included the reference cultivar Morex5, two onto the remaining 17 assemblies (Extended Data Fig. 5). Between 35,859
current or former elite malting varieties (RGT Planet and Hockett), two and 40,044 gene models were annotated by projection in each assem-
founder lines of Chinese barley breeding (ZDM01467 and ZDM02064), bly (Extended Data Table 1) with an average of 37,515 (s.d. = 896). The
Golden Promise and Igri (two genotypes with high transformation number of gene models was about 20% higher in the projections than in
efficiency16,17), Barke (a successful German variety and the parent of de novo annotations (Extended Data Fig. 5e), which indicates that some
several mutant and mapping populations18,19) and one wild barley of the models lack transcript support: possible explanations for the
(H. vulgare subsp. spontaneum (K. Koch) Thell.) genotype from Israel discrepancy are highly tissue-specific expression or pseudogenization.
(B1K-04-12, a desert ecotype collected at Ein Prat)20. The clustering of orthologous gene models yielded 40,176 orthologous
We constructed chromosome-scale sequence assemblies for groups. Of these, 21,992 occurred as a single copy in all 20 assemblies;
20 accessions (Extended Data Table 1). In brief, paired-end and 3,236 occurred in multiple copies in at least one of the 20 assemblies;
mate-pair Illumina short reads were assembled into scaffolds of 13,188 were absent from at least one assembly; and 1,760 were present
megabase (Mb)-scale contiguity (Extended Data Table 1). Scaffold in only one assembly. On average, 14.7% of gene models annotated in
assembly was done with Minia21 and SOAPDenovo22 following the TRI- each assembly occurred in tandem arrays that comprised two or more
TEX method6 (n = 16), DeNovoMagic from NRGene (n = 3) or W2rap23 adjacent copies. These results point to abundant genic copy-number
(n = 1). We used 10X Genomics Chromium linked-reads and chromo- variation between barley genotypes. Future transcriptomic studies will
some conformation capture (Hi-C) data to arrange scaffolds into chro- ascertain the effect of structural variants on gene expression.
mosomal pseudomolecules using the TRITEX pipeline6 (Extended Data
Table 1). A comparison of the short-read assembly of the Morex cultivar
to a long-read assembly of this genotype generated from PacBio long Pan-genome as a tool for genetics and breeding
reads showed high co-linearity at chromosomal scale, good concord- High-quality genome assemblies are a resource for ascertaining and
ance in gene space representation and similar power to detect PAV providing context to structural variants, which can then be genotyped in
(Extended Data Fig. 2), indicating that short-read assemblies are ame- a wider set of germplasm using low-coverage or reduced-representation
nable to pan-genomic analyses in barley. Although the assemblies of sequence data. We used two complementary approaches to detect
the 20 diverse accessions differed in contiguity and the extent of gap structural variation: assembly comparison and clustering of single-copy
sequence in the intergenic space, they had a similar representation of sequences to derive markers that can be scored in short-read data. We
reference gene models (Morex V2) and were highly co-linear to each used the Assemblytics26 software to discover PAV by pair-wise compari-
other at the whole-chromosome scale (Fig. 1b, Extended Data Fig. 3). son of 19 chromosome-scale assemblies to the Morex reference. We
A similar proportion (about 80%) of the assembled sequence of each identified 1,586,262 PAVs, ranging in size from 50 to 999,568 bp, and
genotype was composed of transposable elements, with an average of observed an enrichment for low-frequency variants (Extended Data
113,200 intact full-length long-terminal repeat retro-elements (LTRs) Fig. 6a, b). PAV density was higher in distal, gene-rich regions (Extended
600
550
500
R -12
R 9
2
U 2
O 33
M 50
R 4
M 43
R 7
1
M 1
R x
as 8
i
ri
Ba t
G GT ke
Pr e t
e
ik
t
H ore
is
H 159
O 55
O 36
H 206
H 146
O 08
82
Ak 814
Ig
ke
n
nr
r
9
ZD 03
ZD 90
om
de Pla
04
7
13
3
13
oc
hi
2
0
R
K-
H
R
R
O
O
O
B1
n
R
H
ol
b 20
–log10(P value)
Absent in Morex
15 Present in Morex
10
5
0
1H 2H 3H 4H 5H 6H 7H
c d 0
16,682 bp
Morex 7H –0.5
–1.5
HOR 7552 7H
528.80 Mb –2.0
–2.5
Breakpoints Nud Deletion
–3.0
Hulled Naked
Fig. 2 | Single-copy pan-genome and use of PAVs in association mapping. respectively. c, The most highly associated PAV marker was a 16.7-kb region
a, Cumulative size of single-copy regions in genome assemblies of 20 barley that is deleted in the naked accession HOR 7552 and that contains the NUD
genotypes. The genotypes were ordered according to the size of their unique gene11. d, Allelic status of the NUD deletion in 196 domesticated varieties of
single-copy sequence. b, Genome-wide association scan for lemma adherence barley. Normalized single-copy k-mer counts within the 16.7-kb region are
on the basis of PAV markers. The black and red dots in the Manhattan plot shown for hulled (n = 160 genotypes) and naked varieties (n = 36 genotypes).
denote single-copy sequences that are present and absent in Morex,
Data Fig. 6c), which are characterized by higher nucleotide diversity sequence shared among all 20 genotypes amounted to 402.5 Mb,
and recombination rates8. A total of 5,446 out of 5,602 deletions longer whereas 235.9 Mb were variable (that is, absent or present in higher
than 5 kilobases (kb) found in Barke relative to Morex were mapped copy number in at least one assembly) (Fig. 2a). On average, each of the
genetically in the 90 recombinant inbred lines of the Morex × Barke 20 genotypes contained 2.9 Mb of single-copy sequence not present in
population19 with highly concordant positions (Spearman correla- any other assembly. As observed for transposable element divergence,
tion = 0.99) (Extended Data Fig. 6d), which provides support for the the wild barley B1K-04-12 had the highest amount of unique single-copy
accuracy of the detected polymorphisms. At least one member of 18,562 sequence (Extended Data Table 1).
(46%) groups of orthologous genes overlapped with structural vari- To test the suitability of the single-copy pan-genome for genetic analy-
ants discovered in the 20 sequence assemblies. As observed in other sis in a wider diversity panel without high-quality genome sequences,
plant species27, resistance-gene homologues containing NB-ARC and we collected whole-genome shotgun data (threefold coverage) for
protein kinase domains were frequently found among PAV genes (Sup- 200 domesticated and 100 wild varieties of barley (Supplementary
plementary Table 3). Table 4). The abundance of 160,716 single-copy clusters that overlap
Structural variants cover non-genic regions composed of repetitive structural variants was estimated by counting cluster-constituent
sequence, making it hard to establish orthologous relationships or k-mers (k = 31) in sequence reads of the diversity panel. In addition, we
the presence of specific alleles from short-read data only. To derive analysed genotyping-by-sequencing data of 19,778 gene bank accessions
quantitative estimates of the extent of pan-genomic variation and of domesticated barley9 using the same approach. Abundance estimates
as a tool for genetic analysis such as association scans, we focused based on k-mers (hereafter referred to as ‘pan-genome markers’) showed
on single-copy regions extracted from each of the 20 assemblies and that loci detected as single-copy sequence in one genome assembly can
clustered into a non-redundant set of sequences (hereafter referred to vary in copy number from zero to many in diverse germplasm (Extended
as the ‘single-copy pan-genome’) (Extended Data Fig. 7a). The average Data Fig. 7c). A PCA of pan-genome markers genotyped in whole-genome
cumulative size of single-copy sequence in each accession was 478 Mb shotgun and genotyping-by-sequencing data highlighted the same driv-
(that is, 9.5% of the assembly genome). The total size of non-redundant ers of global population structure as SNPs (Extended Data Fig. 7d–g). In
single-copy sequence was 638.6 Mb, represented by 1,472,508 clus- genome-wide association scans for morphological traits, pan-genome
ters with an N50 of 1,087 bp (Extended Data Fig. 7b). The single-copy markers revealed—with a good signal-to-noise ratio—peaks that are
RGT Planet
RGT Planet
600
Morex
Morex
500
Valticky Diamant
400
141.5 Mb
300
200
100
0
0 100 200 300 400 500 600
Morex position (Mb)
b
140
100 120
80 100
60 80
60
40
40
20 20
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
7H position (Mb) 7H position (Mb)
c
10 10
8 8
6 6
4 4
2 2
0 0
0 100 200 300 400 500 600 0 100 200 300 400 500 600
7H position (Mb) 7H position (Mb)
Fig. 3 | Identification and characterization of a large inversion on distances to recombination rates in the R × H (left) and M × B (right)
chromosome 7H. a, Alignment of the 7H pseudomolecules of the Morex and populations. A single marker per recombination block is shown. d, We
RGT Planet cultivars. b, Alignment of physical and genetic positions mapped in designed a PCR marker (Supplementary Figs. 1, 2a) to screen for the presence of
the RGT Planet × Hindmarsh (R × H) (left) and Morex × Barke (M × B) (right) the inversion in gene bank accessions that represent the Valticky and Diamant
populations. Red shading marks the inverted region. c, We converted genetic cultivars.
consistent with previous reports9 (Fig. 2b, Extended Data Fig. 8). Nota- inversions (more than 5 Mb in size) were prominent in the genome
bly, the pan-genome marker that was most highly associated with lemma alignments of our 20 assemblies (Fig. 1b, Extended Data Fig. 3a, c). Previ-
adherence covered the NUDUM (NUD) gene11 (Fig. 2c). All varieties of ous reports on segregating inversions in barley are anecdotal and have
naked barley—in which lemmas can be easily separated from grains—are focused on induced mutants32,33. To discover inversions in a broader set
thought to trace back to a single mutational event, deleting the entire of germplasm, we mined patterns of contact frequencies in Hi-C data of
NUD sequence11. Another putative knockout allele of NUD (nud1.g) that a diversity panel mapped to a single reference genome34. Among 69 bar-
contains a likely disruptive SNP variant was recently found in Tibetan ley genotypes (67 domesticated and 2 wild accessions) (Supplementary
barley28. All 36 naked accessions in our panel contained the known dele- Table 5), Hi-C-based inversion scans revealed a total of 42 events that
tion (Fig. 2d), indicating that broader sampling of barley diversity—with ranged from 4 to 141 Mb in size (mean size of 23.9 Mb) (Extended Data
a particular focus on centres of (morphological) diversity—is needed Fig. 9a). Most of these events occurred in the low-recombining peri-
to discover novel rare alleles by genomic analyses. centromeric regions of the barley chromosomes and segregated at low
Compared to reference-free approaches for k-mer-based frequency: 25 events were observed only once (Extended Data Fig. 9b, c,
genome-wide association scans such as AgRenSeq29, trait-associated Supplementary Table 6). We focus here on two notable examples: a
pan-genome markers are assigned with high precision to genomic frequent event on chromosome 2H and an inversion in the distal region
positions, and aligning sequence assemblies in their vicinity provides of the long arm of chromosome 7H.
immediate information about differences between haplotypes (Fig. 2c). The inversion in chromosome 7H detected in the RGT Planet cultivar
Furthermore, the reduction of marker number by implicit clustering was the largest event that segregated in our panel (141 Mb) (Fig. 3a).
of k-mers into single-copy loci allows the use of standard mixed linear In a biparental mapping population derived from a cross between
models30,31 to correct for genomic relatedness. RGT Planet and the non-carrier cultivar Hindmarsh (Fig. 3b), this
event repressed recombination in an interval that spanned 49 cM in
the genetic map of the Morex × Barke population19, which is isogenic
A map of polymorphic inversions for absence of the inversion (Fig. 3c, Supplementary Table 7). We also
Chromosome-scale sequence assemblies can reveal large-scale rear- observed a moderately distorted segregation (57% allele frequency,
rangements that are challenging to detect with other methods. Large χ2 = 4.88, P < 0.05) in favour of the Hindmarsh allele in this interval.
PC2 (2.4%)
(6.4%)
4%)
0 0.05
PC2 (6
–0.005
0
–0.010
2H inversion –0.05
–0.015 Other
448,638
HvCEN
432,644 HvCEN
Fig. 4 | Analysis of a frequent inversion on chromosome 2H. a, A PCA in whole-genome shotgun data and located in the inverted regions were used.
showing the localization of inversion carriers in the diversity space of global c, Schematic of the inverted region. The HvCEN gene is closest to the
domesticated barley. The correspondence of PCA coordinates to correlates of breakpoint that is distal in Morex (distance of 449 kb) and proximal in Barke
population structure is shown in Extended Data Fig. 1. Red dots denote carriers (distance of 433 kb) assemblies. A total of 46 and 44 high-confidence (HC)
of the inverted haplotype (n = 87) in a panel of 200 domesticated varieties of genes were annotated in the Morex and Barke assemblies, respectively. The
barley. b, PCA for a diversity panel comprising 200 domesticated (red and yellow arrows (not drawn to scale) mark the positions of PCR primers to probe
green dots) and 100 wild varieties of barley (blue dots). SNP markers detected for the presence of the inversion (Supplementary Fig. 2c).
Recombination frequencies were increased in the flanking regions our panel of 200 domesticated and 100 wild varieties of barley indicated
of the inversion in the RGT Planet × Hindmarsh population relative to a single origin of the inverted haplotype (Fig. 4a, b, Supplementary
Morex × Barke, which suggests a compensatory mechanism to maintain Fig. 2c). The inversion occurred only among domesticated barley of
an average number of one-to-two crossovers per chromosome in the Western geographical origin9, indicating that it arose or has risen to
presence of large tracts of suppressed recombination35. high frequency after domestication. The inverted region contains
By focusing on the inversion breakpoints in the RGT Planet sequence 46 high-confidence genes in the Morex cultivar. The closest gene to
assembly, we designed a diagnostic PCR assay (Supplementary Fig. 2a, the inversion breakpoint—at 448 kb distance from the distal breakpoint
b, d) to rapidly genotype the presence of the inversion in 1,406 acces- in the non-carrier Morex—was HvCENTRORADIALIS (HvCEN)37 (Fig. 4c).
sions (Supplementary Table 8). The inverted haplotype occurred at Although induced mutants of HvCEN flower very early, natural variation
low frequency (1.3%) in the whole panel, but was found in many lines in HvCEN has previously been implicated in environmental adaptation
in the RGT Planet pedigree (Supplementary Fig. 3)—including com- to northern European climates37. All of the inversion carriers we ana-
mercially successful barley cultivars of past decades, such as Triumph, lysed had HvCEN haplotype III, which is associated with later flowering
Quench and Sebastian. The earliest cultivar that carried the inversion in spring barley varieties from northern Europe37,38. Further research is
was Diamant. As one of the donors of the semi-dwarf growth habit, required to determine whether the inversion close to HvCEN has direct
Diamant was a highly influential founder line of modern barley breed- functional consequences (for instance, by modulating HvCEN expres-
ing and traces back to a mutant induced by gamma irradiation of the sion) or whether it hitchhiked along with a tightly linked causal variant.
Czech cultivar Valticky36. We genotyped several gene bank accessions
and germplasm samples of both Valticky and Diamant. Notably, none
of the Valticky samples carried the inversion, whereas it segregated Discussion
in the Diamant samples (Fig. 3d). Quantitative trait loci mapping for The digital representation of the pan-genome can expand the repertoire
yield-related traits in the RGT Planet × Hindmarsh population did not of natural or induced sequence variation that is accessible to genetic
show signals on chromosome 7H (Supplementary Fig. 2e, Supplemen- analyses and breeding. Our comparison of 20 chromosome-scale
tary Table 9), consistent with selective neutrality of the inversion. This sequence assemblies has revealed pervasive variation in genes and
strongly suggests that mutation breeding in the 1960s has given rise to non-coding regions. Focusing on single-copy sequences, we trans-
a cryptic large inversion, which—unbeknownst to breeders—segregates lated this variation into scorable markers that are amenable to popu-
in elite varieties of barley. lation genetic analysis and association scans. A notable finding was
The second inversion we focused on spanned 10 Mb in the interstitial the prevalence of large (more than 5 Mb in size) inversion polymor-
region of chromosome 2H (Fig. 1b) and was present in 26 out of 69 Hi-C phisms in current elite germplasm. It is likely that the suppression
samples (Supplementary Table 8). Local PCA and haplotype analysis in of genetic recombination in inversion heterozygotes has manifested
Acknowledgements We thank M. Knauft, I. Walde and S. König for technical assistance; Competing interests The authors declare no competing interests.
D. Schüler for sequence data management; J. Bauernfeind, T. Münch and H. Miehe for IT
administration; D. Arend for help with data submission; M. Bayer for advice on Additional information
transcriptome analysis; and M. Herz for pedigree information. This research was supported Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
by grants from the German Federal Ministry of Education and Research to N.S., M.M., U.S., 2947-8.
M.S. and K.F.X.M. (SHAPE, FKZ 031B0190), to U.S. and K.F.X.M. (de.NBI, FKZ 031A536) and to Correspondence and requests for materials should be addressed to C.L., M.M. or N.S.
N.S. (COBRA, FKZ 031A323A); the Australian Grain Research and Development Cooperation Peer review information Nature thanks Victor Albert, Scott Jackson and the other, anonymous,
(9176507) to C.L., K.C., P.L. and P.W.; JST CREST Japan (no. JPMJCR16O4 to K.M. and T.H.); reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are
JST Mirai Program Japan (no. 18076896 to K.S.); the National Key R&D Program of China available.
(2018YFD1000701 and 2018YFD1000700) to D.X. and J.Z.; by funding from the China Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | Pan-genome selection in the global barley diversity pan-genome selection (first row), or according to geographic origin (second
space. PCA with genotyping-by-sequencing data of 19,778 varieties of row), row type (third row) or annual growth habit (fourth row). The proportion
domesticated barley sampled from the gene bank of the IPK9. The first six of variance explained by the principal components is indicated in the axis
principal components are shown. Samples are coloured to highlight the labels of the first row. The map was created with the R package mapdata54.
Extended Data Fig. 2 | Comparison between long-read and short-read between Morex PacBio CLR and Morex V2. d, Structural variants between
assemblies of the Morex cultivar. a, Co-linearity between Morex V2 Morex V2 and Morex PacBio CLR assemblies as detected and classified by
(short-read) assembly and the Morex PacBio CLR assembly at the Assemblytics. e, PAVs between Barke and the Morex V2 and Morex CLR
pseudomolecule level. b, Summary statistics of the Morex PacBio CLR assemblies.
assembly and Morex V2 assembly. c, Alignment of NUDUM locus (16 kb)
Article
Extended Data Fig. 3 | Assessment of contiguity and completeness in annotation and full-length cDNAs (28,622 full-length cDNAs) in each assembly.
20 genome assemblies. a, Whole-genome alignments of assemblies of Alignments with less than 90% query coverage and 97% (less than 90% for
19 diverse barley accessions to the Morex V2 reference assembly. b, Alignment full-length cDNAs) identity were discarded. c, Whole-genome alignments show
summary of full-length coding sequences (32,878) from the MorexV2 some examples of large chromosomal inversions identified using Hi-C data.
Extended Data Fig. 4 | Pairwise shared syntenic full-length LTR locations. cultivars. The highest similarity is found between the Barke and RGT Planet
The wild variety B1K-04-12 is set apart as an outgroup, as it shares only 19–26% cultivars (67% shared full-length LTRs).
of its still-intact full-length LTR positions with the other landraces and
Article
Extended Data Fig. 5 | Gene projection and transposable element d, Summary of gene projections and transposable element annotation in
annotation. a, Schematic of the gene projection workflow. TE, transposable 20 accessions. e, Comparison between de novo annotations and gene
element. b, Pipeline for annotation and removing transposable elements. projections for three genotypes. Reported counts refer to
c, Steps to identify tandemly arrayed gene (TAG) clusters in each assembly. non-transposable-element genes.
Extended Data Fig. 6 | Summary of PAVs detected in pan-genome d, Co-linearity between physical position of PAVs detected between the Morex
assemblies. a, Size distribution of PAVs. b, Number of PAVs between and Barke cultivars, and mapped genetically in the POPSEQ population.
20 genome assemblies. c, Distribution of PAVs along the barley genome.
Article
Extended Data Fig. 7 | Analysis of the single-copy pan-genome. a, Pipeline of PCA on the basis of PAV and SNP variants in whole-genome shotgun data of
used to select single-copy k-mers in PAVs as markers for genome-wide 200 diverse accessions (d, e) and 19,778 varieties of domesticated barley 9 (f, g).
association scan analysis. b, Summary of single-copy sequence in 20 genome Top panels show PCA results from 160,716 PAVs; bottom panels show PCA
assemblies and results of their clustering. c, Copy number of single-copy results from 779,503 of genotyping-by-sequencing SNPs. The accessions are
sequences in a diversity panel comprising 200 domesticated and 100 wild coloured according to geographical origin and row type (using the colour code
accessions. Frequency ranges from blue (low) to red (high). d–g, Comparison defined in Extended Data Fig. 1).
Extended Data Fig. 8 | PAV-based genome-wide association scans using domesticated barley. b, PAV-based genome-wide association scan results for
whole-genome shotgun and genotyping-by-sequencing data. a, Manhattan these traits using genotyping-by-sequencing data from 1,000 diverse varieties
plots of PAV-based genome-wide association scans for morphological traits, of domesticated barley collected from the gene bank of the IPK9. The 200
including adherence of grain hull, row type, length of rachilla hairs and awn varieties of barley used for whole-genome shotgun sequencing are a subset of
roughness, using whole-genome shotgun data from 200 diverse varieties of the 1,000 genotyping-by-sequencing genotypes.
Article
Extended Data Fig. 9 | Characterization of large inversions in barley. (n = 90 genotypes). c, Number of inversions present as singletons or shared
a, Inversion size distribution. b, Recombination in inverted regions. between two or more accessions on each chromosome.
Recombination rate was determined in the Morex × Barke RIL population19
Extended Data Table 1 | Summary statistics of 20 pan-genome assemblies and annotation
#Chromosomes 1H to 7H.
§Non-transposable element models or transposable-element filtered.
12345656762589
5653
17425
!"#$"%&'()*+, 3$71($"=C&($"C&4)=)"%#&-$
-&('!#&(#./&'()*+,>'%nonpnp
03&('0!&(4)5$$")%1 ' 22& /
($2!6()!#'4$.$7$(/8()59()&(5!'.7$):;)$82!6$# ('4('84"$("4/&"#(&"!&"4/
$"!($"%:<8'()$"82&($""3&('0&4)!7$4$=>'()?08 &"#()@#$($&7A7$4/)497$(:
1(&($($4
<&77(&($($4&7&"&7/=4"8$2()&(()8775$"%$(2 &!"($"()8$%'7%"#=(&.77%"#=2&$"(B(=C()# 4($":
"D& "8$2#
q ;)B&4(&2!7 $E*F+8&4)B!$2"(&7%'!D4"#$($"=%$6"& &#$4("'2.&"#'"$(82&'2"(
q > (&(2"("5)()2&'2"(5(&9"82#$($"4(&2!7 5)()() &2 &2!75& 2&'#!&(#7/
q; ) (&($($4&7((*+'#>3G5)()()/&"H(5H$##
IFJKLMNOONFLPQRPRLRSNTJULVQLUQRMWXVQULRNJQJKLVKLFYOQZLUQRMWXVQLONWQLMNO[JQ\LPQMSFX]TQRLXFLPSQL^QPSNURLRQMPXNF_
q >#4$!($"8&7746&$&( ((#
q >#4$!($"8&"/& '2!($" 44($"='4)& ((8"2&7$(/&"#&#`'(2"(82'7($!742!&$"
q >8'77#4$!($"8() (&($($4&7!&&2($"47'#$"%4"(&7("#"4/*:%:2&"+().&$4($2&( *:%:% $"488$4$"(+
>3G6&$&($"*:%:(&"#&##6$&($"+& 4$&(#($2&( 8'"4(&$"(/*:%:4"8$#"4$"(6&7+
q <"'7 7)/!()$(($"%=()(((&($($4*:%:a=P=W+5$()4"8$#"4$"(6&7=884($E=#% 88#2&"#b6&7'"(#
cXdQLbLdYJTQRLYRLQ\YMPLdYJTQRLeSQFQdQWLRTXPYVJQ_
q <f&/$&"&"&7/$=$"82&($""()4)$48!$&"#C&964)&$"C"(&7 (($"%
q <)$&4)$4&7&"#42!7B#$%"=$#"($8$4&($"8()&!!!$&(7678((&"#8'77!($"%8'(42
q @($2&( 8884($E *:%:)"gU=A&"gW+=$"#$4&($"%)5()/54&74'7&(#
ITWLeQVLMNJJQMPXNFLNFLRPYPXRPXMRLhNWLVXNJNiXRPRLMNFPYXFRLYWPXMJQRLNFLOYFKLNhLPSQL[NXFPRLYVNdQ_
18(5&&"#4#
A7$4/$"82&($"&.'(&6&$7&.$7$(/842!'(4#
G&(&4774($" 3 8(5&5& '#8#&(&4774($":
G&(&&"&7/$ ;0r;@s& 2.7/!$!7$"*422$(,tpuv88n+=30k"G36C&%$46w:p=C$"$2&!n*6$"n:vt+=A$!2&9*7&npvvHpxHvnHpv+=
kC>A*7&npvxHptHpu+=1>C(7*6v:x+=36(*yw:pz:po+=f<(7*6v:x+=0*6w:o:v+=> 2.7/($4*6v:n:v+=rm!$!7$"
*1C0;-$"96o:prm6v:p+=.7&(*6woBv+=B"&(*6n:n:p+=>l0G*6w:w:w+={()<$"#*6n:w:v+=62&(4)*n:w:p+=%"2(7*v:o:|+=
ffG'9*ffC&!}wt:|w+=k>Ar;*6w+=$%&!)*6v:v:n+=13A07&(*6v:vp:n+=C&!~;-*6z+=C@>;*6v:v+=>5*6n:w:w+
<2&"'4$!('($7$E$"%4'(2&7%$()2 8(5&()&(&4"(&7(()&4).'("(/(#4$.#$"!'.7$)#7$(&('=8(5&2'(.2&#&6&$7&.7(#$(D6$5:
j ("%7/"4'&%4##!$($"$"&422'"$(/!$(/*:%:k$(l'.+:1()3&('0&4)%'$#7$" 8'.2$(($"%4#? 8(5&88'()$"82&($":
G&(&
A7$4/$"82&($"&.'(&6&$7&.$7$(/8#&(&
>772&"'4$!(2'($"47'#&#&(&&6&$7&.$7$(/(&(2"(:;)$ (&(2"()'7#!6$#()8775$"%$"82&($"=5)&!!7$4&.7,
H>44 $"4#='"$m'$#"($8$=5.7$"98!'.7$47/&6&$7&.7#&(&(
H>7$(88$%' ()&()&6& 4$&(#&5#&(&
H>#4$!($"8&"/($4($" "#&(&&6&$7&.$7$(/
0
<A7$&7#7H4((!)"4.$87$5(4 ! ($ " %
12345656762589
55653
17
)&($().(8$(8/'&4):r8/'&"('=&#()&!!!$&( 4($" .82&9$"%/'74($":
q -$8 4$"4 f)&6$'&7? 4$&74$"4 @47%$4&7=67'($"&/?"6$"2"(&74$"4
<&8"44!/8()#4'2"(5$()&774($"="&(':42D#4'2"(D"H!($"%H'22&/H87&(:!#8
->7$78('#$2'4$(#"$447"(")(!'#$"/# $% "
(6"5)"()#$47'$
$"%&($6:
7425
1&2!7 $E >44 $" 8%"2& 2.7/54)"((4
npv|3&('k"($4+:
6()#$6$(/*A>+!&48np=ppp#2($4&(#.&7/%"(/! *C$7"(&7:=
>44 $" 8m'"4$"%*jk1=l$H+54)".&#"( " )4 (#8$"#../C$
/ 7"(&7:
3 &2!7 $E4&74'7&($"5& !82#:1&2!7 54)"822&`%2!7&2%'! 6$#"($"A " >.4&'876&"48
.&7/%"($4*f&9=k7#"A2$=r%$+
G&(&B47'$" 33#&(&5B47'##:
0!7$4&($" >!&($&77/!7$4&(##$%"5& 2!7/#$$"8 "$7#($&7:3
3!7$4&($"5& #"$$"%
" "2& 2.7/&"# m'"4$"%:
0&"#2$E&($" <&$"7##($&75#"$$"
" &"#2$E#.749#$%":@B!$2"(&7.6&($" 5#"5$()'(!#8$"#%'!$"%:{()82 8
2$E&($"5"(76&"(((
)$ ('#/:
f7$"#$"% f7$"#(($"%5& "(#"&& $(5& "(76&"((! 7&"(274'7&%"($4&"#%"2$459:
Article
https://doi.org/10.1038/s41586-020-2830-7 Kara L. Marshall1, Dimah Saade2, Nima Ghitani3, Adam M. Coombs1, Marcin Szczot3,
Jason Keller4,5, Tracy Ogata2, Ihab Daou1, Lisa T. Stowers4, Carsten G. Bönnemann2,
Received: 18 March 2020
Alexander T. Chesler2,3 ✉ & Ardem Patapoutian1 ✉
Accepted: 22 July 2020
The mechanotransduction channel necessary for urinary reflexes feeling the need to void and therefore followed a voiding schedule.
remains unknown. Several ion channels have been implicated in uri- A healthy frequency is defined as five to six voids per day. Despite the
nary tract function in vivo5–7, but none have been shown to be required lack of normal sensory feedback, all patients had achieved continence
for micturition reflexes. Moreover, it is not clear which cells are the at the time of evaluation except for one nine year old. However, many
primary sensors: umbrella cells of the innermost layer in the urothelium patients reported sudden urge incontinence, where any delay in voiding
have been proposed to be mechanosensory8,9, but the bladder is also resulted in urinary accidents. Two individuals reported occasional noc-
innervated by mechanically sensitive afferents from dorsal root ganglia turnal enuresis, and four had stress incontinence caused by laughter,
(DRG)1,10. PIEZO2 is the primary mechanosensor that mediates touch, cough and/or postural changes, with one case being severe enough
proprioception and mechanical allodynia in mice11–15. Loss-of-function to require treatment. Several patients had a sensation of incomplete
mutations in PIEZO2 also resulted in complete deficits in these senses voiding and an irregular urinary stream. Three adults described a sen-
in humans13,16. Furthermore, PIEZO2 mediates interoceptive processes sation of pelvic heaviness when their bladder was full, and all three
such as lung-stretch sensing and baroreception in mice17,18, but intero- independently reported voiding by leaning over or using their hands
ceptive deficits have not been studied in humans who are deficient in to apply pressure to their lower abdomen. Overall, these data suggest
PIEZO2. As urination is driven by mechanical interoceptive reflexes, we that PIEZO2 has a key functional role in human urination.
investigated whether PIEZO2 is important for urination. We next carried out studies in mice to understand where and how
To understand how PIEZO2 contributes to urination in humans, PIEZO2 functions in the urinary tract. To test whether Piezo2 is present in
PIEZO2-deficient individuals (n = 12; 5–43 years of age) answered ques- bladder sensory neurons, we used RNA fluorescent in situ hybridization
tionnaires designed to capture pathology and validated against healthy (FISH) in DRG tissue taken from three mice after injection of cholera
control individuals to screen for voiding and elimination dysfunc- toxin B–Alexa Fluor 488 (CTB), a neuronal tracer, into the bladder wall
tion19 (Fig. 1). We also assessed urological history, previous medical (Fig. 2a). Out of 92 bladder-innervating neurons labelled with CTB, 75
evaluations and non-invasive bladder ultrasound scans (Supplementary expressed Piezo2 transcript (81.5%). Piezo2 transcript was also detected
Table 1). All patients reported decreased voiding frequency, as low as in a subset of bladder urothelial cells expressing Krt20 (Fig. 2b), a marker
once or twice daily, regardless of hydration status. Notably, the major- of umbrella cells that line the bladder lumen and have been proposed
ity of individuals reported that they could spend an entire day without to contribute to detection of bladder filling8. Seventy-four per cent of
1
Howard Hughes Medical Institute, Department of Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, CA, USA. 2National Institute of Neurological Disorders
and Stroke, National Institutes of Health, Bethesda, MD, USA. 3National Center for Complementary and Integrative Health, National Institutes of Health, Bethesda, MD, USA. 4Department of
Neuroscience, Dorris Neuroscience Center, The Scripps Research Institute, La Jolla, CA, USA. 5Present address: Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA.
✉e-mail: alexander.chesler@nih.gov; ardem@scripps.edu
Fig. 1 | Urinary dysfunction in individuals deficient in PIEZO2. Patient (no pathology); blue, less than half of the time (pathology score of 1); yellow,
numbers correspond to those in Supplementary Table 1. Grey indicates a half of the time (pathology score of 2); orange, more than half of the time
neutral answer or one not indicating pathology. Urinary frequency information (pathology score of 3); red, every day or every night if night time is indicated in
is scored differently to the other questions, and is colour-coded according question (pathology score of 4). Asterisk indicates an unanswered question.
to the pathological score assigned to the answer in the questionnaire. Dagger indicates that the individual answered twice per day during clinical
Unless otherwise noted, the colour code indicates the following: grey, never interview.
umbrella cell nuclei were associated with Piezo2 transcript, and 12.6% of that correspond to micturition events (Fig. 3a, Extended Data Fig. 2b).
cells showed high expression of Piezo2 (1 s.d. above the mean). The den- Piezo2-knockout mice displayed irregular micturition timing (Fig. 3b,
sity of Piezo2-positive cells varied across the bladder. These results show Extended Data Fig. 2c) and, on average, longer intervals between blad-
that Piezo2 is expressed in two distinct cell types in the lower urinary der contractions that resulted in urination (Fig. 3c, Extended Data
tract and could function in detecting relevant mechanical stimuli. We Fig. 2h). Therefore, PIEZO2-deficient mice are less sensitive to blad-
confirmed that urothelial cells express other mechanosensory proteins der filling, as it takes more volume to initiate bladder contractions.
(Extended Data Fig. 1), but the functional deficits in PIEZO2-deficient Piezo2-knockout mice also displayed higher bladder pressures five
humans focused our work on this protein. seconds before contraction peaks (Fig. 3d, Extended Data Fig. 2i). In
We next used calcium imaging to determine whether bladder-stretch healthy animals, low pressures are maintained during bladder filling
responses in sensory neurons were dependent on Piezo2. Whether neu- because the detrusor muscle relaxes. The higher pressures that were
rons detect bladder stretch directly or downstream of urothelial cell observed before contractions suggest that this relaxation process or
activation, we expect this stimulus to cause calcium influx. We injected bladder compliance is impaired in Piezo2-knockout mice.
a viral vector carrying Cre recombinase into postnatal day 0 (P0) to P3 We next investigated individual micturition events to determine
pups carrying the Cre-dependent calcium indicator GCaMP6f and a whether the bladder pressure required to sustain micturition was
conditional Piezo2-knockout allele, Piezo2cKO (Piezo2fl/fl-GCaMP6f+/+; con- abnormal in these mice. We observed consistent pressure increases
trols were GCaMP6f+/+)13. Thus, Piezo2 was deleted anywhere GCaMP6f within and among wild-type mice (Fig. 3e, Extended Data Fig. 2d), but
was expressed. We observed rapid, robust responses in control sacral bladder pressure traces in Piezo2-knockout mice were highly variable
level 1 (S1) DRG neurons in response to manual, high-pressure blad- (Fig. 3f, Extended Data Fig. 2e). The knockout mice exhibited higher
der filling with saline (Fig. 2c), but these responses were markedly peak bladder pressures (Fig. 3g), and required significantly more pres-
attenuated in Piezo2cKO DRG cells (Fig. 2d). Notably, cells responding sure during contractions, suggesting that the detrusor muscle must
to low-pressure stimuli were completely absent in Piezo2-knockout work harder to accomplish micturition (Fig. 3h, Extended Data Fig. 2j).
DRG (Fig. 2e). Calcium traces for cells in wild-type DRG revealed graded We also assessed whether sensory input via PIEZO2 is important for
responses to pressure stimuli, with many cells responding to low and urethral reflexes, which sustain efficient urination. During bladder
high pressures, but Piezo2cKO cells were silent at low pressures. Piezo2cKO contractions in wild-type mice, there was coordinated engagement of
DRG also had fewer cells responding to bladder stretch (Fig. 2f–m), but the urethra muscle (Fig. 3i). This reliable urethral activity was markedly
normal numbers of cells responding to painful pinch (Extended Data attenuated in Piezo2-knockout mice (Fig. 3j, k, Extended Data Fig. 2k).
Fig. 1). This suggests that PIEZO2 is a key sensor of bladder stretch. Knockout urethra responses varied from silent or weak coordination to
Mechanically-evoked micturition reflexes coordinate the bladder inappropriately timed hyperactivity (Extended Data Fig. 2c). Hyperac-
detrusor muscle and the urethral sphincter muscles to mediate efficient tivity is a sign of detrusor–sphincter dyssynergia, a condition involving
urinary control, and are critical for efficient voiding4. We hypothe- uncoordinated communication between the muscle groups responsi-
sized that reflex responses rely on PIEZO2 function to provide feed- ble for urination. This indicates that the urethra was not receiving the
back control over bladder pressure and urethra activity. We therefore appropriate sensory input to govern its activity during micturition.
investigated micturition reflexes in mice lacking PIEZO2 in all caudal Mice lacking PIEZO2 also had more variable, larger void volumes, as
tissues. These Hoxb8Cre;Piezo2fl/fl mice express Cre recombinase in longer periods between contractions allow more bladder filling (Fig. 3l).
bladder-innervating DRG neurons14 and bladder urothelium (Extended Together, these data indicate that PIEZO2 sets stretch sensitivity in
Data Fig. 2a). We used cystometry with urethral electromyography the lower urinary tract and initiates appropriately timed reflexes that
to simultaneously monitor bladder pressure and sphincter activity. contribute to efficient urination.
With continual filling at 30 μl min−1, control mice initiated bladder Next, we tested whether urination behaviour is altered in
contractions at regular intervals, which are observed as pressure peaks Piezo2-knockout mice. We placed mice on filter paper for 4 h and
No of. responding
30
(>20 mmHg)
20 Low
cells
(<20 mmHg)
10
0
f g WT KO
Pressure
Pressure
20 mmHg
20 mmHg
h i
300 s 300 s
Calcium avg.
Calcium avg.
(all cells)
(all cells)
10% ΔF/F
10% ΔF/F
j 20
k 20
Calcium
Calcium
% ΔF/F
% ΔF/F
traces
traces
3 cells
5 5
l 20
m 20
% ΔF/F
% ΔF/F
Calcium
Calcium
peak
peak
5 5
Fig. 2 | Piezo2 is expressed in the lower urinary tract, and sensory neurons bladder-filling stimuli in wild-type (WT) (n = 3 mice) and Piezo2 cKO (KO) (n = 4
require PIEZO2 to detect low-pressure bladder filling. a, DRG neurons were mice) DRGs. f, g, Example pressure trace from wild-type (f) and Piezo2 cKO (g)
retrogradely labelled using CTB (cyan, left) and fluorescent in situ DRG. Stimuli were interleaved during recording, but are shown sorted low to
hybridization (FISH) of DRGs with probes targeting Piezo2 (magenta, middle). high, hence the discontinuous line. Data below these graphs are sorted
Arrowheads point to Piezo2-expressing bladder neurons. Scale bar, 50 μm. The together with the respective pressure peaks. h, i, Average per cent change in
tracing experiment was repeated using three mice, n = 22–36 cells analysed per calcium fluorescence for all responding cells during the pressure peaks shown
mouse. b, FISH for Krt20 (green) and Piezo2 (magenta) in bladder. Arrowheads in f, g, respectively. j, k, Calcium traces for individual wild-type cells that
point to Piezo2-expressing umbrella cells. Scale bar, 50 μm. FISH was responded to pressure stimuli in f (n = 17) ( j) and for Piezo2 cKO cells responding
performed on three bladders, with two technical replications. Analysis was to pressure stimuli shown in g (n = 6) (k). Each cell’s responses are shown on the
performed on 80–117 nuclei per bladder. c, d, Image z-stack from GCaMP6f+/+ same horizontal line. Cells are sorted by cumulative response to the four
control mouse (c) and Piezo2 cKO mouse (d) S1 DRG during bladder filling. lowest-pressure stimuli. l, m, Maximum calcium response for the
e, Count of cells responding to low-pressure (black) and high-pressure (red) corresponding cells in j, k, 1 s after pressure peak.
imaged the resulting urination patterns with UV illumination. We used urinary reflexes in Piezo2-knockout mice lead to detrusor hypertrophy,
only female mice to preclude territorial scent-marking behaviour. an indicator of chronic voiding dysfunction.
Wild-type mice typically urinated in the corners and edges of the cage We next investigated the cell types in which PIEZO2 was required.
in large spots (Fig. 3m). Piezo2-knockout mice had a variety of urination We tested whether PIEZO2 deficiency in urothelial cells changed the
patterns, and some displayed urine leaking (small spots) or large voids pressure threshold that is required to initiate micturition. We used the
towards the cage centre (Fig. 3n, o). This phenotype was not attributed Upk2-cre allele to abrogate PIEZO2 activity in urothelial cells (Extended
to the knockout mice spending more time in the middle of the cage Data Fig. 3a), which have been proposed to act as stretch sensors and
(Fig. 3p). Thus, knockout mice have abnormal urination behaviour, communicate to underlying neurons using ATP7,9,22–24. We found that
including some apparent incontinence. Upk2-cre;Piezo2fl/fl knockout mice displayed similar phenotypes to the
We next studied whether this observed urinary dysfunction in Hoxb8-cre;Piezo2fl/fl knockout mice, with higher bladder stretch thresh-
Piezo2-knockout mice led to long-term consequences. Chronic uri- olds, increased bladder pressure during micturition and attenuated
nary tract dysfunction typically causes tissue remodelling as the urethral reflexes (Fig. 4a–h). In combination with expression data from
bladder wall grows thicker to compensate for inefficient voiding20,21. FISH (Fig. 2b), these data indicate that PIEZO2 acts in umbrella cells to
This remodelling can eventually result in ‘decompensation’, which is help set bladder-stretch sensitivity and initiate appropriate micturition
marked by a flaccid, ineffective bladder with sequelae of incomplete reflexes. These results confirm the proposed role for umbrella cells
voiding, vesicoureteral reflux and increased frequency of urinary tract as mechanosensory cells that participate in initiating micturition8.
infections. Bladder-wall thickening was observed by haematoxylin and We observed similar phenotypes in mice that lacked PIEZO2
eosin staining in Piezo2-deficient mice (Fig. 3q–s). The weight of freshly only in sensory neurons (Fig. 4i–p). Deleting PIEZO2 in all sensory
excised bladders also revealed bladder-wall remodelling, as bladders neurons is lethal, so we used Scn10a-cre mice25 (Scn10a encodes the
from the knockout mice were significantly heavier than those from voltage-gated sodium channel Nav1.8) to delete PIEZO2 in the Aδ- and
wild-type littermates (Fig. 3t, Extended Data Fig. 2m). Thus, impaired c-fibre subsets, which are the primary sensory neuron types described
Pressure (mmHg)
400
15
10
20 mmHg 200
5
0 0
100 s
WT KO WT KO
e Wild type f Knockout g h
25
Bladder pressure (mmHg)
15 30 15
10 20 10
5 10 5
0
0 0
–20 –10 0 10 20 –20 –10 0 10 20
Time from peak (s) Time from peak (s) WT KO WT KO
Volume (μl)
100
1 × 106
5 50
0 0 0
–20 –10 0 10 20 –20 –10 0 10 20 WT KO WT KO
Time from peak (s) Time from peak (s)
m o q s t
Filter paper urination
Bladder muscle Bladder weight
***
Urine in centre (%)
80 * ***
Wild type
60 600 50
40 Muscle thickness (μm)
40
20
Weight (mg)
400
0 30
n p WT KO r
Time in centre (%)
40 20
Knockout
200
30
20 10
10
0 0
0 WT KO WT KO
WT KO
Fig. 3 | PIEZO2 is required for efficient micturition reflexes. a, b, Example Student’s t-tests with Welch’s correction. In c–l, n = 6 (wild-type) and n = 5
pressure traces from three female wild-type mice (a) and three female (Hoxb8-cre;Piezo2fl/fl) female mice; n = 10–29 bladder contractions analysed per
Hoxb8-cre;Piezo2fl/fl mice (KO) (b) during continuous bladder filling. mouse. m, n, Urination patterns of five wild-type (m) and five Hoxb8-cre;Piezo2fl/fl (n)
c, d, Bladder-contraction intervals (P < 0.0001) (c) and bladder pressure five mice. o, Quantification of urine in the middle 50% of the cage (P = 0.0001). n = 11
seconds before contraction peaks (P < 0.0001) (d). e, f, Heat maps showing female mice per group. p, Wild-type and Hoxb8-cre;Piezo2fl/fl mice spend similar
bladder contractions in wild-type (e) and Hoxb8-cre;Piezo2fl/fl (f) female mice. amounts of time in the cage centre. q, r, Haematoxylin and eosin staining from
Each row represents bladder pressure during a single micturition event, with wild-type (q) and Hoxb8-cre;Piezo2fl/fl (r) bladder sections, from 6- to 7-month-old
peaks aligned at 0. Arrowheads mark where data from one animal end and data littermates. Scale bars, 100 μm. The muscle layer is marked with vertical lines.
from another begin. g, h, Peak bladder pressures (P < 0.0001) (g) and area under s, t, Bladder muscle wall thickness (s; n = 5, P = 0.016) and total bladder weight
the curve (AUC) for bladder contractions (P < 0.0001) (h). i, j, Heat maps showing (t; n = 9 (wild type) and 8 (Hoxb8-cre;Piezo2fl/fl), P = 0.0002). In o, s, t, Mann–
urethra activity in wild-type (i) and Hoxb8-cre;Piezo2fl/fl (j) female mice from Whitney test; otherwise, two-sided Student’s t-tests with Welch’s correction.
e, f, with rows corresponding to bladder-contraction events in e, f. k, Urethra Data are mean ± s.d.
activity during micturition (P < 0.0001). l, Void volume measurements (P = 0.03).
in the bladder10,26. This mouse line does not induce recombination in mice, but not in Scn10a-cre;Piezo2fl/fl mice. Neuronal PIEZO2-knockout
urothelial cells (Extended Data Fig. 3b–e). Sensory-neuron-specific mice do require more bladder pressure for micturition and have highly
Piezo2-knockout mice displayed longer intervals between contrac- attenuated urethral reflex responses (Fig. 4o, p). These data implicate
tions (Fig. 4i), but the pressure before contractions was not different PIEZO2 in mediating neuronal stretch responses that are critical for
from that in wild-type mice, as it was in urothelial-specific- and full downstream urethral reflexes. Of note, mice with Piezo2 knockout in
caudal-knockout mice (Fig. 4j). This implies that mechanosensory individual tissues did not display the marked bladder remodelling that
stimuli activate PIEZO2 in umbrella cells to initiate bladder relaxation was observed in full caudal-knockout mice (Fig. 3s, t, Extended Data
during filling (Fig. 4b) and that neuronal mechanosensing is dispensa- Fig. 3f, g), suggesting that urothelial or neuronal PIEZO2 alone could
ble for this process. Alternatively, it is possible that bladders become still contribute to urinary function. These results indicate that there is a
fibrotic and less compliant in Upk2-cre;Piezo2fl/fl and Hoxb8-cre;Piezo2fl/fl two-part signalling mechanism involving PIEZO2 in umbrella cells and
400 20 20
250
200 10 10
0 0 0 0
WT KO WT KO WT KO WT KO
c Wild type d Knockout k Wild type l Knockout
Bladder pressure (mmHg)
15 15
10 10
5 5
0 0
–20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20
Time from peak (s) Time from peak (s) Time from peak (s) Time from peak (s)
e f m n
15 15
Urethra activity (μV)
5 5
0 0
–20 –10 0 10 20 –20 –10 0 10 20 –20 –10 0 10 20–20 –10 0 10 20
Time from peak (s) Time from peak (s) Time from peak (s) Time from peak (s)
g Bladder contractions h Urethra contractions o Bladder contractions p Urethra contractions
**** * ** ****
15 20
Contraction AUC (mmHg)
Contraction AUC (mmHg)
2 × 106 2 × 106
Sum activity (μV)
15
10
1 × 106 10 1 × 106
5
5
0 0
0 0
WT KO WT KO WT KO WT KO
fl/fl
Fig. 4 | PIEZO2 functions in both bladder urothelium and sensory neurons. wild-type and Scn10a-cre;Piezo2 (KO) mice. n = 3 wild type and 3
a–h, Cystometry data from wild-type and Upk2-cre;Piezo2fl/fl (KO) mice. n = 5 Scn10a-cre;Piezo2 fl/fl female mice; 11–24 contractions per mouse. Cartoon in the
wild type and 4 Upk2-cre;Piezo2 fl/fl female mice; 18–49 bladder contractions top right depicts the lower urinary tract, with Piezo2 KO tissue in red.
analysed per mouse. Cartoon in the top right depicts the lower urinary tract, i, j, Intervals between bladder contractions (P = 0.002) (i) and bladder
with Piezo2 KO tissue in red. a, b, Intervals between bladder-contraction voids pressures five seconds before peak contraction ( j). k, Bladder pressure events
(P < 0.0001) (a) and bladder pressures five seconds before peak contraction during continuous filling cystometry in wild-type (k) and Scn10a-cre;Piezo2fl/fl (l)
(P = 0.001) (b). c, d, Bladder pressure events during continuous filling mice. m, n, Urethra activity recorded during the bladder contraction events
cystometry in wild-type (c) and Upk2-cre;Piezo2fl/fl (d) mice. e, f, Urethra activity shown in k, l. o, p, Bladder pressure during micturition events (P = 0.004) (o)
recorded during the bladder contraction events shown in c, d. g, h, Bladder and urethral reflex responses during micturition (P < 0.0001) (p). Data are
pressure during micturition events (P < 0.0001) (g) and urethral reflex mean ± s.d. Two-sided Student’s t-test with Welch’s correction. *P ≤ 0.05,
responses during micturition (P = 0.03) (h). i–p, Cystometry data from **P ≤ 0.01, ***P ≤ 0.001 and ****P ≤ 0.0001.
sensory neurons that set bladder sensitivity and promote micturition For example, the mechanotransduction ion channels TMEM63B and
reflexes. Further investigations are required to address how these cell PIEZO1 are widely expressed in the urothelium, and PIEZO1 partially
types communicate. mediates urothelial stretch responses in vitro27.
We have used evidence from mice and humans to identify the mecha- Our results suggest a two-part model of mechanosensory signalling
notransduction channel PIEZO2 as a critical mediator of urinary tract in the urinary tract, which is reminiscent of epithelial cell–neuronal
function. Absence of Piezo2 in mice does not result in urinary tract sensory machinery in the skin (Merkel cell–neurite complexes), lung
paralysis and death, and PIEZO2-deficient humans are still able to uri- (neuroepithelial bodies) and intestine (enterochromaffin cells)15,17,28,29.
nate. This indicates that there are mechanotransduction proteins other Our results also implicate umbrella cells in mediating bladder relaxa-
than PIEZO2 in the urothelium and lower urinary tract sensory neurons. tion during filling, perhaps by signalling to bladder muscle and/or
loss-of-function either found us or were referred to our group through Author contributions K.L.M. designed and performed all mouse cystometry, behavioural
our network of international collaborators. Genotype information experiments and tissue histology, analysed data and, together with A.P., wrote the manuscript.
can be found in Supplementary Table 1, along with past treatments D.S., T.O., C.G.B. and A.T.C. designed and performed the human clinical assessments. Calcium
imaging and analysis was performed by N.G., K.L.M. and M.S. Retrograde labelling and FISH
and diagnoses. One patient, P10, carried a nonsense and a deleterious experiments were performed by K.L.M., A.M.C. and I.D. J.K. and L.T.S. contributed analytical
splice site variant in compound heterozygosity. Also as stated above, tools for data analysis, technical support and conceptual project design. C.G.B, A.T.C. and A.P.
all patients presented with a profound congenital ubiquitous lack of contributed to project design and supervision. All authors discussed results and contributed
to manuscript editing.
proprioception, vibration, and specific loss of touch discrimination
on glabrous skin. Detailed history, clinical evaluation and testing were Competing interests The authors declare no competing interests.
conducted including an in-depth review of urinary function, urologi-
Additional information
cal history, review of previous evaluations and non-invasive blad- Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
der ultrasound. Patients were recruited from all over the world and 2830-7.
their age ranged between 5 to 43 years (see table). Four adult patients Correspondence and requests for materials should be addressed to A.T.C. or A.P.
Peer review information Nature thanks Eric Honoré, Jon Levine and Mark Nelson for their
(3 females and 1 male) provided their own history. None of the patients contribution to the peer review of this work.
were taking any medications that could affect urinary function at the Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 1 | The bladder urothelium expresses multiple standard deviation of responses from genital pinch in WT and d, Piezo2 cKO DRG.
mechanosensitive proteins, and PIEZO2 is not required for sensory neuron e, Quantification of peak responses during pinch shown as percent of baseline
pinch responses. a, FISH in bladder tissue with probes against Krt20 (green) (each data point is one cell). n = 3 DRGs, 40 cells for WT, 4 DRGs and 69 cells for
and Piezo1 (white). DAPI in blue. b, FISH in bladder tissue with probes against Piezo2 cKO DRGs.
Krt20 (green) and Tmem63b (white). DAPI in blue. c, Z-projection of the
Extended Data Fig. 2 | See next page for caption.
Article
Extended Data Fig. 2 | PIEZO2 is required for efficient micturition reflexes Note: 1,200 s was the length of one recording. These dots represent recording
in male mice. a, Hoxb8-cre;Ai9 bladder tissue, fixed, frozen and mounted to periods in which the animal had no successful urination events. j, Total bladder
show tdTomato (red) throughout the tissue, labelled with DAPI (blue). pressure for males and k, sum of urethra activity during bladder contractions.
Scale is 100 μm. Expression was evaluated in two mice. b, Example pressure n = 6 males per group. P < 0.0001 for graphs in h, i, j and k, two-sided Student’s
and urethra activity traces from three wild-type males and c, three Hoxb8- t-test with Welch’s correction. l, Body weights from a subset of mice whose
cre;Piezo2 fl/fl knockout male littermates. d, Heat map of individual bladder bladder weights are shown in Fig. 2t, and m, bladder weights from animals in l,
contraction events in wild-type and e, knockout male mice, with corresponding shown as a percentage of body weight. Red horizontal lines indicate means,
urethra activity below in f and g respectively. h, Bladder contraction intervals vertical red bars indicate +/− standard deviation (shown where possible).
for males. i, Bladder pressures five seconds before peak contraction for males.
a Upk2-cre;Ai9 b Scn10a-cre;Ai9
200 µm
c d e
50 µm
40 40
Weight (mg)
Weight (mg)
30 30
20 20
10 10
0 0
WT KO WT KO
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis ImageJ (version 2.0.0-rc-49/1.51d), and Acqknowledge software (version 4.4.2) were used, and versions added to the supplementary
information. Code availability statement in manuscript: "Code for calcium imaging analysis is previously published13. Matlab (R2018b) code
was used for cystometry analysis and is available at: https://github.com/PatapoutianLab/cystometry."
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
April 2020
The raw data that support the findings of this study are available from the corresponding author upon reasonable request.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions We established exclusion criteria prior to collecting cystometry data: data from the first 30 minutes of cystometry recording was not used
because bladder muscle activity has often not stabilized. Moreover, animals that displayed bladder leaking during recording were excluded
from analysis, as leaking indicated a flawed seal and thus inaccurate filling responses.
Replication FISH experiments were independently replicated 2-3 times with the same results. Cystometry recordings were performed in independent
male and female cohorts, and results were replicated. To verify the reproducibility of experimental findings, we restricted the time of day that
cystometry recordings were done (Zeitgeber 8-14) and we performed every experiment in a cohort of male and female mice to compare to
their wildtype littermates.
Randomization The order of recordings for different genotypes was randomized. Beyond this, assigning animals to experimental groups is not relevant to this
study, as the groups are defined by genotype. Animals of different sexes were analyzed independently to remove this covariate.
Blinding The experimenter was blind to genotype when possible for all experiments. HoxB8Cre+;Piezo2f/f knockout mice have obvious motor
impairments, so it was impossible to keep the experimenter blind for these groups.
Laboratory: #029281) to create sensory-neuron specific and urothelial specific Piezo2 knockout animals, respectively. Each of these
Cre lines was also crossed with Ai9 mice (B6.Cg-Gt(ROSA)26Sortm9(CAG-tdTomato)Hze/J, Jackson Laboratory: # 07909) to assess Cre
expression.
2
Ethics oversight All experiments were performed within the protocols and guidelines approved by the Institutional Animal Care and Use Committees
Recruitment Patients were recruited on the basis of their biparentally inherited bi-allelic homozygous or compound heterozygous
nonsense variant mutations in the Piezo2 gene. Patients with PIEZO2 loss of function either found us, or were referred to our
group through our network of international collaborators. The nature of this group means that we are only analyzing patients
without functional Piezo2, which is the goal of the study.
Ethics oversight Research protocol approved by the Institutional Review Boards of National Institute of Neurological Disorders and Stroke
(NINDS, protocol 12-N-0095)
Note that full information on the approval of the study protocol must also be provided in the manuscript.
Clinical data
Policy information about clinical studies
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Clinical trial registration This is not a clinical trial, but approval was through: NINDS, protocol 12-N-0095
Data collection Data was collected at the NIH between April of 2015 and May of 2020.
April 2020
3
Article
https://doi.org/10.1038/s41586-020-2926-0 Tetsuya Takano1 ✉, John T. Wallace1, Katherine T. Baldwin1, Alicia M. Purkey1, Akiyoshi Uezu1,
Jamie L. Courtland2, Erik J. Soderblom1,3, Tomomi Shimogori4, Patricia F. Maness5,6,
Received: 23 January 2020
Cagla Eroglu1,2 ✉ & Scott H. Soderling1,2 ✉
Accepted: 19 August 2020
The majority of central nervous system (CNS) synapses are ensheathed (Extended Data Fig. 1a), the biotinylation activity of Split 1-TurboID
by tiny astrocytic processes1,2. These astrocytic contacts are an integral was higher than that of Split 2-TurboID (Extended Data Fig. 1b, lane 8).
functional compartment of the tripartite synapse, which is defined as We therefore used Split 1-TurboID for the remainder of this study, and
the combination of pre- and postsynaptic neuronal, and perisynaptic named the two molecules N-TurboID and C-TurboID. We also used a
astrocytic, processes3. At the synapse, astrocytes control basal synaptic GRAPHIC-tagged full-length TurboID construct, TurboID-surface, to
transmission, neuromodulation, ionic balance and neurotransmitter biotinylate astrocyte surface proteins (Extended Data Fig. 1a, bottom).
clearance4–8. Furthermore, astrocyte and synapse development are In astrocyte–neuron co-cultures, astrocytes expressing TurboID-
interdependent processes that are regulated by dynamic bidirectional surface under the control of the GfaABC1D promoter17 (Extended
intercellular communication via secreted factors and cell adhesion Data Fig. 1c) exhibited biotinylation activity along their mem-
molecules9–12. Historically, however, gaining molecular insights branes (Extended Data Fig. 1d). Moreover, the reconstituted activ-
into perisynaptic astrocyte–neuron signalling has been hampered ity of Split-TurboID was found only at contact sites between neurons
owing to the lack of biochemical methods for isolating this astrocytic and astrocytes (Extended Data Fig. 1c), but not when either of the
compartment. halves were expressed alone (Extended Data Fig. 1d). To investi-
To identify proteins at the extracellular clefts between astro- gate whether TurboID-surface or Split-TurboID biotinylates tripar-
cytes and neurons, we developed a chemico-genetic in vivo BioID tite synapses in these cultures, astrocytes were co-transduced with
(iBioID) approach, based on reconstituting the enzymatic activity of GfaABC1D-mCherry-CAAX to mark astrocyte membranes, and synapses
a proximity-biotinylating enzyme, TurboID13, at astrocyte–neuron were labelled by immunostaining with pre- and postsynaptic makers
junctions (Fig. 1a). Recent studies have shown that split biotinyla- (excitatory, VGLUT1 and HOMER 1; inhibitory, VGAT and gephyrin).
tion constructs could recover enzymatic activity when they were in Both constructs mediated biotinylation that overlapped with astro-
close proximity in the cell cytoplasm14,15. In this study, we used our cytic membranes and closely associated with excitatory and inhibi-
glycosylphosphatidylinositol-anchored reconstitution-activated tory synaptic markers (Extended Data Fig. 2a–d), demonstrating the
proteins highlight intercellular connections (GRAPHIC) strategy16 to functional reconstitution of TurboID transcellularly at perisynaptic
direct N- and C-terminal TurboID fragments to the extracellular sur- astrocyte–neuron junctions in vitro.
face of neurons and astrocytes (Fig. 1a, Extended Data Fig. 1a). Among To test their activity in vivo, the constructs were introduced into
the two Split-TurboID construct pairs that we tested in HEK 293T cells mouse brain astrocytes and/or neurons via retro-orbital injections
1
The Department of Cell Biology, Duke University Medical School, Durham, NC, USA. 2Department of Neurobiology, Duke University Medical School, Durham, NC, USA. 3Duke Proteomics and
Metabolomics Shared Resource and Duke Center for Genomic and Computational Biology, Duke University Medical School, Durham, NC, USA. 4Molecular Mechanisms of Brain Development,
Center for Brain Science (CBS), RIKEN, Saitama, Japan. 5Department of Biochemistry, University of North Carolina School of Medicine, Chapel Hill, NC, USA. 6Department of Biophysics,
University of North Carolina School of Medicine, Chapel Hill, NC, USA. ✉e-mail: tetsuya.takano@keio.jp; cagla.eroglu@duke.edu; scott.soderling@duke.edu
Control
TurboID-surface
Split-TurboID 5 μm
d e f
Streptavidin colocalization
TurboID-surface Split-TurboID TurboID-surface Split-TurboID 100 P < 0.001
ce
ID
bo
rfa
1 μm 1 μm
ur
su
-T
D-
lit
oI
Sp
rb
Tu
Fig. 1 | Identification of the astrocyte–neuron synaptic cleft proteome neuronal eGFP and astrocyte mCherry–CAAX. d, e, Three-colour STED images
using in vivo Split-TurboID. a, Schematic of the Split-surface iBioID showing biotinylated proteins adjacent to excitatory synaptic markers PSD95
approach. b, Outline of Split-TurboID method using cell-type-specific AAVs. and VGLUT1 (d), and inhibitory synaptic markers gephyrin and VGAT (e). f, The
ITR, inverted terminal repeats; hSyn1, human synapsin 1 promoter; GPI, ratio of biotinylated proteins that colocalized with VGLUT1, PSD95, VGAT or
glycosylphosphatidylinositol; WPRE, woodchuck hepatitis virus post- gephyrin. n = 15 cells per each condition from 3 mice. n = 3 biological repeats.
transcriptional regulatory element; pA, polyadenylation. c, Confocal images of Student’s paired t-test, comparing TurboID-surface and Split-TurboID. Data are
cortical expression of Split-TurboID or TurboID-surface coexpressed with mean ± s.e.m.
of adeno-associated viruses (AAVs)18 at postnatal day (P)21 (Fig. 1b, purified and analysed by quantitative high-resolution liquid chro-
Extended Data Fig. 3a–c) and the mice were given subcutaneous biotin matography–tandem mass spectrometry (LC–MS) (Fig. 2a). When
injections starting at P42 for 7 days (Fig. 1b)19. Biotinylated proteins were combined, the Split-TurboID and astrocyte-specific TurboID-surface
detected by immunoblotting and immunohistochemistry (Extended datasets identified 776,376 peptides corresponding to 3,171 distinct
Data Fig. 4a–d) both for astrocyte-specific TurboID-surface and recon- proteins (Extended Data Fig. 4g). After three independent experiments
stituted Split-TurboID constructs. However, when Split-TurboID frag- and following removal of known contaminants19, 173 and 178 proteins
ments were expressed alone, no biotinylation was observed (Extended were found to be significantly enriched (1.5. fold) in Split-TurboID and
Data Fig. 4a–c). These results show that the TurboID-surface and astrocyte-specific TurboID-surface fractions, respectively, compared
Split-TurboID constructs generate extracellular biotinylation in vivo. with soluble TurboID control (Extended Data Fig. 4g–i, Supplementary
To confirm that biotinylated proteins localize to neuron–astro- Tables 1, 2). This enrichment approach is stringent, and thus may not
cyte contacts, we labelled neurons with eGFP (using AAV PHP. identify all astrocytic proteins that are present at perisynaptic pro-
eB-hSynI-eGFP) and astrocyte membranes with mCherry–CAAX cesses, as it selects only those that are overrepresented at synapses
(using AAV PHP.eB-GfaBC1D-mCherry-CAAX) and co-injected either compared with other compartments.
astrocyte-specific TurboID-surface or Split-TurboID-expressing A total of 118 proteins were common between the two datasets, yield-
viruses. In both conditions, biotinylated proteins were located at the ing a high-confidence tri-partite synapse proteome (Fig. 2b, Extended
contacts between astrocytic and neuronal processes (Fig. 1c). Using Data Fig. 4g–i, Supplementary Table 3). This list includes known tripar-
super-resolution stimulated emission depletion (STED) microscopy, tite synapse proteins such as neuroligin-3 and neurexin-19, calcium chan-
we found that biotinylated proteins surround excitatory and inhibi- nel auxiliary subunits that also regulate glutamate receptor trafficking
tory synapses (Fig. 1d, e). More than 50% of TurboID-surface-induced (CACNA2D3, CACNG2 and CACNG3), excitatory synaptic proteins such
biotinylation and more than 90% of Split-TurboID-induced biotinylation as AMPA receptors (GRIA2 and GRIA3), and inhibitory synaptic pro-
was closely associated with synaptic markers (Fig. 1f). The densities of teins such as type A γ-aminobutyric acid (GABAA) receptors (GABRA1,
synapses were not affected by either labelling approach (Extended Data GABRA4, GABRB2 and GABRG2) (Fig. 2b). By cross-referencing our pro-
Fig. 4e, f). Together, these results show that the TurboID-surface and teomics data set with cell-type-specific gene-expression databases20,21,
Split-TurboID constructs effectively biotin-label perisynaptic contacts we found that messenger RNA for 33 of these proteins were enriched in
between astrocytes and neurons in vivo. astrocytes (RNA-sequencing expression ratio >1.0, diamonds in Fig. 2b),
76 were enriched in neurons (circles in Fig. 2b) and 5 proteins had equal
or unknown distribution (Fig. 2b). Bioinformatics analysis showed that
Perisynaptic cleft proteome discovery our high-confidence tripartite proteome contained known synaptic
To identify the tripartite synaptic proteins, proteins biotinylated by cleft proteins (29 proteins, 25%), cell adhesion proteins (18 proteins,
Split-TurboID or astrocyte-specific TurboID-surface constructs were 15%), channels (18 proteins, 15%), G-protein-coupled receptors and
Intensity
Streptavidin
beads m/z
AAV injection Biotin injection Mouse cortex Biotinylated proteins Data acquisition
Split-TurboID dissection purification
TurboID-surface
Syp
Previous synaptic cleft proteomics Cell adhesion molecules
b Thy1
Gjc3 Slc30a3
Tspan2 Cacng3
Serpina1a
Atp8a1 Ngly1 Sv2b
Ctsd Hapln4
Gria3 Hba-a1 Vdac3 Gstm7
Vdac2
Atp9a Tpt1
Cpe Ptk2b
Tenm1 Cst3 Cntnap2 Prxl2a
Syt7 Pdia3
Ppt1 Prdx4 Lynx1
Slc17a6 Gria2 Nrcam Cdipt
Nlgn3 Pten
Alb Cacng2
Nrxn1 Sacm1l
Apmap Clic4 Serpinb1a
Sord
C3 Gabrg2
Tm9sf2 Pzp Ppia
Scamp3 Channels and VGCCs Receptors
Ostc Tfrc Tenm2 Olfm1
Lrp1
C1qbp Adck1 Split- Hapln1 C1qb Hsp90b1
Cntfr TurboID
Split-TurboID and TurboID-surface
Fig. 2 | The astrocyte–neuron synaptic cleft proteome. a, Outline of enriched in neurons (mRNA expression in astrocyte/neuron <1) are in circles,
proteomic approach. b, Left, overlapping high-confidence proteins shared those for which gene expression is enriched in astrocytes (mRNA expression in
between the Split-TurboID and TurboID-surface enriched fractions. Right, astrocyte/neuron ≥1.0) are in diamonds. Edges are shaded according to the
clustergram topology of proteins in selected functional categories. Node titles type of interaction (grey, iBioID; black, previously reported protein–protein
show the corresponding gene symbols and node size represents log 2 fold interactions).
enrichment over negative control. Proteins for which gene expression is
associated proteins (4 proteins, 3%), other receptors and associated (measured by neuropil infiltration volume (NIV)) (Extended Data
proteins (16 proteins, 14%), secreted or extracellular matrix compo- Fig. 5h–k). By contrast, the deletion of NRCAM significantly increased
nents (34 proteins, 29%), and proteins encoded by genes implicated NIV (Extended Data Fig. 5j, k), indicating that NRCAM is a negative
in disorders, including autism spectrum disorder and schizophrenia regulator of astrocytic elaboration into the neuropil. Thus, we focused
(34 proteins, 29%) (Fig. 2b). on NRCAM for further analysis.
Adhesions between astrocytes and neurons have critical roles in
orchestrating the concurrent development of synapses and morpho-
genesis of astrocytes9,22. To identify regulators of this process, we NRCAM regulates astrocyte morphogenesis
selected teneurin-2 (TENM2), teneurin-4 (TENM4) and NRCAM as can- To confirm that endogenous NRCAM is labelled by Split-TurboID in vivo,
didate bridging molecules between astrocytes and neurons. To deplete we used STED imaging, which showed that NRCAM colocalizes with
target proteins in astrocytes, we used a CRISPR-based approach. We biotinylated proteins in vivo (Extended Data Fig. 6a). NRCAM has previ-
confirmed depletion of astrocytic NRCAM using this approach by ously been identified at contacts between axons and myelinating glia24,25
quantitative western blot analysis (Extended Data Fig. 5a). NRCAM and has been studied as a neuronal protein regulating dendritic spine
single guide RNA (sgRNA) in combination with astrocyte-specific pruning26,27 but not, to our knowledge, in astrocytes. Cell-type-specific
Cas9 significantly diminished the level of NRCAM protein in mixed transcriptome analysis shows that levels of mRNA encoding NRCAM
neuron–astrocyte cultures; this could be rescued by re-expression of are higher in astrocytes than in neurons or oligodendrocytes20,21. We
sgRNA-resistant human NRCAM in astrocytes (Extended Data Fig. 5b, c). confirmed NRCAM protein expression in cultured astrocytes by western
Next, we used this astrocyte-specific CRISPR-based approach in vivo to blot (Extended Data Fig. 6b). Next, we analysed NRCAM localization
rapidly gain preliminary data on candidate proteins23 (Extended Data in astrocytes in vivo by STED microscopy, observing that endogenous
Fig. 5d, e). We retro-orbitally injected AAVs containing sgRNA for each NRCAM puncta colocalized with astrocytic membranes (Extended
candidate gene together with Cre recombinase under the control of an Data Fig. 6c, d).
astrocyte-specific promoter (AAV PHP.eB-U6-sgRNA-GfaABC1D-Cre) NRCAM is known to function in part through a homophilic transcel-
into conditional Cas9 knock-in (KI) mice. Astrocyte-specific Cre lular interaction28. In agreement, when we injected neuron-specific and
expression was confirmed in vivo using a tdTomato Cre-reporter astrocyte-specific Nrcam-expressing viruses into P21 mice (Extended
line (Extended Data Fig. 5f, g). We used either a negative control virus Data Fig. 6e), we observed colocalization of sparsely expressed astro-
(AAV-empty sgRNA-GfaABC1D-Cre) or sgRNA virus against each cytic haemagglutinin-tagged NRCAM (NRCAM–HA) with neuronal
target gene along with astrocyte-specific mCherry–CAAX to quantify NRCAM–V5 (Extended Data Fig. 6f) by STED imaging at P42.
astrocyte morphology. NRCAM is also expressed during early postnatal development26,27.
Compared with controls, loss of TENM4 but not TENM2 in P42 Deletion of NRCAM from astrocytes during the first two weeks of devel-
mouse cortical astrocytes significantly decreased astrocyte territory opment significantly increased astrocytic territory size and enhanced
volume and the infiltration of fine astrocyte processes into the neuropil NIV when compared with controls (Extended Data Fig. 7a–g). These
VGLUT1
NRCAM
VGAT
Analysis
e
rin
AM
Ez
RC
0.5 μm
N
f 200
STED
5 μm 1 μm Ezrin NRCAM NS
/mCherry–CAAX/VGAT /mCherry–CAAX/VGAT 150
Astrocytic NRCAM–HA
Gephyrin
VGAT 100
50
rin
AM
5 μm 1 μm 0.5 μm
Ez
RC
N
g Excitatory mCherry–CAAX h i j
synapse (astrocyte process) Control NRCAM sgRNA
Inhibitory synapse 200 200
NS
Astrocyte process–
Astrocyte process–
VGLUT1 150 150
NRCAM
STED
100 100
PSD95 VGAT
50 50
Gephyrin 0.5 μm
0 0
mCherry–CAAX (astrocyte process)
A
A
l
sg l
tro
AM ntro
RN
RN
± NRCAM /PSD95/VGLUT1
on
sg
o
C
C
AM
RC
RC
N
N
NRCAM sgRNA NRCAM sgRNA
k Control NRCAM sgRNA + hNRCAM + neuroNRCAM sgRNA neuroNRCAM sgRNA
STED
1.0 μm
P < 0.01
Astrocyte process–
Astrocyte process–
300
200
NS NS
200
100
100
0 0
oN AM s M
oN AM s M
ne NR CA NR NA
ne NR CA NR NA
ur N + h sgR l
AM gR NA
AM gRNNA
A
A M tro
ur N + sgR l
A
A M ro
ur C M CA
ur C M CA
RN
sg A
sg A
RN
RN A on
RN A nt
RC s gR
N
RC s gR
sg RC Co
sg RC C
o R h
o R
N
N
AM
AM
ne
ne
RC
RC
+
+
N
Fig. 3 | NRCAM controls astrocyte-neuron contacts in vivo. a, Schematic of astrocyte NRCAM adjacent to excitatory synapses (h) or inhibitory
the visualization of astrocytic NRCAM in vivo. b, Three-colour STED images synapses (k). hNRCAM, human NRCAM; neuroNRCAM sgRNA, deletion of
demonstrating that astrocytic NRCAM are adjacent to excitatory synapses or neuronal NRCAM. i, j, l, m, Quantification of average distance between
inhibitory synapses. c, Schematic of astrocytic NRCAM distribution assay astrocytic process and excitatory synapses (i, j) or inhibitory synapses (l, m)
in vivo. d, Three-colour STED images showing mCherry–CAAX-positive (n = 30 puncta per condition from 3 brains). n = 3 biological repeats. In
NRCAM or ezrin adjacent to excitatory presynapses and inhibitory e, f, i, j, Student’s paired t-test. In l, m, one-way analysis of variance (ANOVA)
presynapses. e, f, Quantification of average distance between astrocytic with Dunnett’s multiple comparison. Data are mean ± s.e.m. NS, not significant.
NRCAM and VGLUT1 or VGAT (n = 30 puncta per each condition from 3 brains). Arrows in b, d, h, highlight examples of adjacent synaptic fluorescent signals in
g, Schematic of in vivo astrocytic process–neuronal synapses contact assay. the images.
h–m, Three-colour STED images of an astrocytic process following deletion of
phenotypes were rescued by coexpression of sgRNA-resistant NRCAM– comprising residues 620–1193 and lacking the immunoglobulin
HA in astrocytes (Extended Data Fig. 7b–g). NRCAM is a type I mem- domain; and NRCAM(ΔECD), comprising residues 1030–1193 and
brane protein with a modular extracellular domain architecture that lacking both immunoglobulin and fibronectin domains (Extended Data
is composed of repeated immunoglobulin and fibronectin domains Fig. 7b, c). Neither mutant rescued the morphology of NRCAM-deleted
(Extended Data Fig. 7b). To determine whether extracellular interac- astrocytes (Extended Data Fig. 7d–g), indicating that the extracel-
tions of NRCAM are required for astrocytic morphogenesis in vivo, lular interactions via immunoglobulin domains of NRCAM are nec-
we created two deletion mutants of human NRCAM: NRCAM(ΔIG), essary for maintaining the wild-type morphology. To test whether
AM
AM
RC
Control
an G
G
4
N
Ig
Ig
ti-
ti-
(kDa)
trl
trl
an
3
C
150
NRCAM 10 μm
100 2
100
Gephyrin
NRCAM–HA
75 1
100
PSD95 75 0
sg A
RC R ol
M
A
150
N
tr
AM CA
RN
hy sgR
100
on
NRP2
C
N
neuroNRCAM
rin
NRCAM–HA
b HEK293T co-culture
ep
sgRNA
oN
G
ur
+
Inhibitory synapse NRCAM
ne
VGAT
NRCAM f
NRCAM–HA
P < 0.005
gephyrin
GABAA R
sgRNA
Gephyrin–VGAT colocalization
+
P < 0.001
HEK 293T cell 25
oN M sg M
l
ne NR CA NR A
A
RN AM tro
AM RN A
o R h N
ur CA M CA
RN
RC sg RN
sg A
on
ur N + gR
C
A s
sg RC
10 μm
AM N
ne
RC
+
N
2 μm
P < 0.01
II/III 5
NRCAM sgRNA
0
IV
A
sg rol
RN
t
AM on
P42 Cas9 knock-in mice V
C
AAV-NRCAM sgRNA + Cre
RC
N
j k l m n Fast Slow
(< 2.8 ms) (> 2.8 ms)
4 P < 0.001
mIPSC frequency (Hz)
Cumulative frequency
A
l
sg l
RN
AM tro
AM ntro
RN
RN
t
sg
C
RC Co
interval (s)
C
RC
RC
N
Fig. 4 | Astrocytic NRCAM controls inhibitory synaptic organization and indicate examples of colocalizing synaptic markers. f, Average number of
function. a, Co-immunoprecipitation (IP) from cortical lysates of NRCAM with inhibitory synaptic colocalized puncta within astrocyte territories from
gephyrin, PSD95 and NRP2. b, Schematic of co-culture assay to identify effects cells as in e. n = 15 cells per each condition from 3 mice. g, Schematic of
of non-neuronal NRCAM–HA on inhibitory synaptic specializations. c, Images electrophysiology experiments in L2/3 pyramidal neurons of V1 cortex.
of NRCAM–HA coexpressed with eGFP in HEK 293T cells co-cultured with h, mIPSC traces from L2/3 pyramidal neurons following astrocyte treatment
neurons depleted of NRCAM or gephyrin. d, Mean integrated intensity of with control (empty sgRNA) or NRCAM sgRNA. i–m, mIPSC amplitude (i, j),
GABA A receptor in contact with transfected HEK 293T cells counted from frequency (k, l) and rise time (m) (n = 20 cells per condition from 4 mice).
cells as in c (number of cells: n = 418 control, n = 416 NRCAM–HA, n = 297 n, mIPSC amplitudes sorted by fast and slow rise times. In d, f, one-way
neuroNRCAM sgRNA + NRCAM, n = 356 gephyrin sgRNA + NRCAM). e, Images ANOVA with Dunnett’s multiple comparison; n = 3–6 biological repeats. In
of inhibitory synapses among NRCAM-deficient astrocytes. High i, k, n, Student’s paired t-test. Data are mean ± s.e.m.
magnification images (bottom) correspond to outlined areas (above), arrows
Author contributions T.T., C.E. and S.H.S. designed the study. T.T., J.T.W., A.P., C.E. and S.H.S.
Data availability wrote the manuscript. T.T., J.T.W., A.U. and E.J.S. performed in vivo BioID-proteomics analysis.
Proteomics data are available in the MassIVE database under accession T.T., J.T.W., J.L.C., T.S. and P.F.M. produced the constructs. T.T., J.T.W. and K.T.B. performed
imaging analysis and the morphological analysis of the astrocytes. A.P. performed
MSV000085821. The data that support the findings of this study are
electrophysiological analysis. T.T. and K.T.B. performed the biological experiments. All authors
available from the corresponding author upon reasonable request. discussed the results and commented on the manuscript text.
46. Spence, E. F. et al. In vivo proximity proteomics of nascent synapses reveals a novel Competing interests The authors declare no competing interests.
regulator of cytoskeleton-mediated synaptic maturation. Nat. Commun. 10, 386
(2019). Additional information
47. Shin, J. H., Yue, Y. & Duan, D. Recombinant adeno-associated viral vector production and Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
purification. Methods Mol. Biol. 798, 267–284 (2012). 2926-0.
48. Takano, T. et al. LMTK1 regulates dendritic formation by regulating movement of Correspondence and requests for materials should be addressed to T.T., C.E. or S.H.S.
Rab11A-positive endosomes. Mol. Biol. Cell 25, 1755–1768 (2014). Peer review information Nature thanks Thomas Biederer, Peter Scheiffele and the other,
49. Takano, T. et al. Discovery of long-range inhibitory signaling to ensure single axon anonymous, reviewer(s) for their contribution to the peer review of this work.
formation. Nat. Commun. 8, 33 (2017). Reprints and permissions information is available at http://www.nature.com/reprints.
Extended Data Fig. 1 | The reconstituted activity of Split-TurboID in infected with AAV1/2-GfaABC1D-TurboID-HA-surface, AAV1/2-hSynI-
neurons and astrocytes in vitro. a, Schematics of constructs tested. V5-N-TurboID and/or AAV1/2-GfaABC1D-C-TurboID-HA. Representative images
b, Immunoblot analysis of construct expression and biotinylation activity. of neuron and astrocyte at DIV14 after the treatment of 500 μM biotin for 6h are
c, Schematic of neuron-astrocyte mixed-culture assay for Split-TurboID with shown. n = 3 biological repeats.
cell-type-specific AAVs in vitro. d, Cultured neurons and astrocytes were
Article
Extended Data Fig. 2 | Split-TurboID maps excitatory and inhibitory HOMER1 (b), inhibitory presynaptic marker VGAT (c), and postsynaptic marker
perisynaptic proteins. a–d, Representative images demonstrating that gephyrin (d). Astrocytes were visualized with GfaABC1D-mCherry-CAAX. n = 3
proteins biotinylated by astrocytic TurboID-surface or Split-TurboID (cyan) are biological repeats.
adjacent to excitatory presynaptic marker VGLUT1 (a), postsynaptic marker
Extended Data Fig. 3 | Brain-wide transduction of astrocytes and neurons. expression throughout the cortex and other structures. c, Representative
a, Schematic of AAV PHP.eB viruses for neuronal-EGFP or astrocyte-mCherry- image from cortex, hippocampus or cerebellum showing high coverage of
CAAX and retro-orbital injection. b, Sagittal section of mouse brain showing neuronal and astrocytic expression.
Article
Extended Data Fig. 6 | NrCAM is a novel tripartite synaptic protein. a, A high sections were immunostained with anti-NrCAM antibody (cyan). High
magnification STED image showing that endogenous NrCAM was enriched at magnification image was shown (right panel). e, Schematic of the visualization
biotinylated proteins in vivo. b, Immunoblot analysis of endogenous NrCAM, of both astrocytic and neuronal NrCAM in vivo. f, STED images demonstrating
astrocyte marker GFAP, neuronal marker b-Tubulin III or loading control that the colocalization of astrocytic NrCAM with neuronal NrCAM in vivo.
α-Tubulin from mouse brain or purified astrocyte lysate. c, Schematic of the Coronal sections were prepared and co-immunostained with an anti-V5 (cyan)
visualization of astrocytic membrane and endogenous NrCAM in vivo. d, STED and anti-HA (magenta) antibody. A high-magnification image is shown in the
images demonstrating the localization of endogenous NrCAM in vivo. Coronal right. n = 3 biological repeats. Data represent means ± s.e.m.
Extended Data Fig. 7 | The role of astrocytic NrCAM in astrocytic constructs of sgRNA-resistant human NrCAM, neuronal NrCAM deletion
morphogenesis in vivo. a, Schematic of CRISPR-based NrCAM deletion (neuroNrCAM sgRNA), or following neuronal NrCAM deletion alone. Images at
in vivo. b, Schematic of hNrCAM domains and fragments. SP, signal peptide; indicated ages represent. e, i, Analysis of astrocyte territory, 15–29 cells per
IG, immunoglobulin; FN, fibronectin; TMD, transmembrane domain; ECD, each condition from 3 mice; g, k, Analysis of neuropil infiltration volume.
extracellular domains. c, Immunoblots showing the expression of each NrCAM 50–51 cells per each condition from 3 mice. n = 3 biological repeats. One-way
fragments in HEK293T cells. d, f, h, j, Images of astrocytes following deletion of ANOVA (Dunnett’s multiple comparison, P < 0.0001). Data represent
astrocyte NrCAM alone (NrCAM sgRNA), with coexpression with indicated means ± s.e.m.
Article
Extended Data Fig. 9 | The effect of NrCAM on excitatory synapse c, mEPSC traces from L2/3 pyramidal neurons following astrocyte control
formation and function in vivo. a, Images of postsynapse PSD95 and empty sgRNA or NrCAM sgRNA expression. d–g, Quantification of mEPSC
presynapse VGLUT1 within NrCAM-deletion astrocytes in L1 of the visual amplitude (d, e, Cont = 16, NrCAM sgRNA = 14 cells from 4 mice) and frequency
cortex. High magnification images (bottom) correspond to boxes (above). (f, g, Cont = 14, NrCAM sgRNA = 17 cells from each of 4 mice). At least n = 3
b, Quantification of average number of excitatory synaptic colocalized puncta biological repeats. Student’s t-test (paired, P < 0.05). Data represent
within astrocyte territories. n = 15 cells per each condition from 3 mice. means ± s.e.m.
Extended Data Fig. 10 | In vivo chemogenetics method, Split-TurboID, tripartite synapse in vivo. Mapping this interface, we discovered a new
reveals a novel astrocytic cell adhesion molecule, NrCAM, that controls molecular mechanism by which astrocytes influence inhibitory synapses
inhibitory synaptic organization. Development of in vivo chemo-affinity within the tripartite synaptic cleft via NrCAM. NrCAM is expressed in cortical
codes, Split-TurboID, and a working model of astrocytic NrCAM influencing astrocytes where it interacts with neuronal NrCAM that is coupled to gephyrin
inhibitory synaptic function. Split-TurboID can map the molecular composition at inhibitory postsynapses. Loss of astrocytic NrCAM dramatically alters
of such intercellular contacts, even within the highly complex structure of the inhibitory synaptic organization and function in vivo.
nature research | reporting summary
Scott H. Soderling, Cagla Eroglu and Tetsuya
Corresponding author(s): Takano
Last updated by author(s): Jul 24, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Imaris software (v8.2.1.), Leica Application Suite (LAS) software (v3.5.5), ImageJ (v10.2), and Mascot Distiller and Mascot Server (v 2.5,
Matrix Sciences) were used for data analysis. Minora Feature Detection alogrithm is part of the Protein Discover Package version2.2.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
April 2020
The data that support the findings of this study are available from the corresponding author upon reasonable request.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Blinding Images data collection and statistical analyses were analyzed blinded to the experimental conditions.
Antibodies
Antibodies used The following antibodies were used: monoclonal anti-V5 (ThermoFisher, R960-25, IB 1:1000, IF 1:500, IHC 1:500), rat anti-HA (Sigma,
12158167001, IB 1:1000, IF 1:500, IHC 1:200), mouse anti-HA (Biolegend, MMS-101P, IB 1:1000), chicken anti-GFP (Abcam, ab13970,
IB 1:1000, IF 1:1000, IHC 1:1000), rabbit anti-mCherry (Abcam, ab167453, IF 1:500, IHC 1:500), rabbit anti-PSD95 (Life Techonologies,
51-6900, IHC 1:200), mouse anti-PSD-95 (ThermoFisher, 7E3, IB 1:1000), guinea pig anti-VGLUT1 (Synaptic Systems, 135-304, IF
1:1000, IHC 1:1000), rabbit anti-gephyrin (Synaptic Systems, 147-002, IF 1:1000, IHC 1:500), mouse anti-gephyrin (Synaptic Systems,
147-011, IB 1:1000, IF 1:300), guinea pig anti-VGAT (Synaptic System, 131-004, IF 1:1000, IHC 1:500), rabbit anti-NL2 (Synaptic
System, 129-202, IB 1:500), rabbit anti-NrCAM (Abcam, ab24344, IB 1:1000, IHC 1:200), rabbit anti-Homer1 (Synaptic Systems,
160002, IF 1:2000), rabbit anti-GABA-A receptor 2 (Synaptic Systems, 224-803, IF 1:1000), goat anti-Neuropilin-2 (R & D Systems,
AF567, IB 1:500), rat anti-tdTomato (Kerafast, EST203, IHC 1:1000), rat anti-Tubulin (Santa Cruz, sc-53029, IB 1:1000), rabbit anti-
Ezrin (Cell Signaling, #3142, IHC 1:200), rabbit anti-EAAT2 (GLT1) (Alamone, AGC-022, IB 1:1000), rabbit anti-Kir4.1 (Alamone,
APC-035, IB 1:500), rabbit anti-NL3 (Novus, NBP1-90080, IB 1:500), Alexa Fluor 488 Goat anti-Mouse (ThermoFisher, A32723), Alexa
Fluor 488 Goat anti-Rabbit (ThermoFisher, A-11034), Alexa Fluor 488 Goat anti-Guinea pig (ThermoFisher, A11073), Alexa Fluor 488
Goat anti-Chicken (ThermoFisher, A-11006), Oregon Green 488 Goat anti-Rabbit (ThermoFisher, O-11038), Alexa Fluor 555 Goat anti-
Rabbit (ThermoFisher, A21428), Alexa Fluor 568 Goat anti-Rat (ThermoFisher, A-11077), Alexa Fluor 594 Streptavidin (ThermoFisher,
April 2020
S11227), Alexa Fluor 647 Donkey anti-rabbit (ThermoFisher, A31573), Alexa Fluor 647 Goat anti-Chicken (ThermoFisher, A-21449),
Alexa Fluor 647 Donkey anti-Guinea pig (Jackson ImmunoResearch, 706-605-148), Alexa Fluor 647 Streptavidin (ThermoFisher,
S21374), Atto647N anti-Mouse (Sigma, 50185), Atto647N anti-rabbit (Sigma, 40839), Donkey anti-Goat IRDye 800CW (LI-COR,
926-32214), Goat anti-rat IRDye 800CW (LI-COR, 925-32219), Goat anti-Mouse IRDye 680RD (LI-COR, 925-6818).
Validation 1 monoclonal anti-V5 ThermoFisher R960-25 ELISA, Immunocytochemistry, Immunofluorescence, Immunoprecipitation, Western
Blot Vender (IB, IF)
2 rat anti-HA Sigma 12158167001 ELISA, Immunocytochemistry, Immunofluorescence, Immunoprecipitation, Western Blot
"Hougbing Liu et al., 2014. J AM Heart Assoc 20; 3(3) (IB)
2
Fimiani et al., 2015. Nucleic Acids Res 18;43(16) (IB, IF)
Stogsdill et al., 2017. Nature "
Mycoplasma contamination The cell lines were tested for mycoplasma contamination and were negative.
Commonly misidentified lines No commonly misidentified cell lines were used in the study.
(See ICLAC register)
Field-collected samples This study did not involve samples collected from the field.
Ethics oversight The Duke University Institutional Animal Care and Use Committee provided ethical approval and guidance.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
April 2020
3
Article
The human gut microbiota is considered a major modulator of the between 2003 and 2019 (Fig. 1a, Supplementary Table 1). The condition-
immune system during development3 and in health and disease8,9. For ing regimen of radiation and chemotherapy administered to patients
example, preterm infants have distinct microbiome compositions before HCT is the most severe perturbation to the immune system
and distinct developmental trajectories of peripheral immune cell deliberately performed in humans: this offers a unique opportunity to
populations3. In adults, the success of immunotherapies that rely on investigate links between the gut microbiota and immune dynamics
peripheral immune cells, such as checkpoint inhibitor treatments, directly in humans.
has been associated with the composition of the microbiome11–13,15. Conditioning depletes white blood cell (WBC) counts, leading to
There is an increasing interest in using the microbiome to modulate the neutropenia (less than 500 neutrophils per μl blood) until transplanted
immune system and augment treatments7,16, including the growing field stem cells begin to release granulocytes from the bone marrow, initi-
of chimeric antigen receptor T cell therapy17. However, our understand- ating immune reconstitution (Fig. 1a–c). HCT also damages the gut
ing of how the microbiota influences the dynamics of immune cells microbiota18 and reduces its biodiversity (Fig. 1d–i), a collateral effect
in humans and how this compares to deliberate immunomodulatory associated with increased mortality in patients undergoing HCT19.
interventions remains limited owing to a lack of feasible experiments Immune and microbiome reconstitution vary considerably between
in human subjects. patients and treatment types (Fig. 1, Extended Data Fig. 1a), enabling
To overcome this limitation, we investigated whether the gut micro- analyses of associations between microbiome and immune system, and
biota could influence day-by-day changes in peripheral immune cell their comparison with immunomodulators such as granulocyte-colony
counts. We collected a vast dataset of immune-reconstitution dynam- stimulating factor (GCSF).
ics after allogeneic haematopoietic cell transplantation (HCT) from To detect a directional and causal link between the microbiota and cir-
individuals treated at Memorial Sloan Kettering Cancer Centre (MSK) culatory WBCs, we first used data from a randomized trial of autologous
Institute for Computational Medicine, NYU Langone Health, New York, NY, USA. 2Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer
1
Center, New York, NY, USA. 3Adult Bone Marrow Transplantation Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 4Weill Cornell Medical
College, New York, NY, USA. 5Infectious Disease Service, Department of Medicine, and Immunology Program, Sloan Kettering Institute, New York, NY, USA. 6Harvard University, T. H. Chan
School of Public Health, Boston, MA, USA. 7Sahlgrenska Cancer Center, Department of Microbiology and Immunology, Institute of Biomedicine, University of Gothenburg, Gothenburg,
Sweden. 8Department of Laboratory Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 9Division of Hematologic Malignancies and Cellular Therapy, Duke University
School of Medicine, Durham, NC, USA. ✉e-mail: jonas.schluter@nyulangone.org; xavierj@mskcc.org
(×1,000 μl–1)
15 Neutrophil counts
GCSF GCSF
10
5
0
3
(×1,000 μl–1)
2 Lymphocyte counts
0
(×1,000 μl–1)
3 Monocyte counts
1.5
0
0 20 40 56 0 20 40 56 0 20 40 56
Time after HCT (d) Time after HCT (d) Time after HCT (d)
d e f
Mean (n = 1,294)
Diversity
15
10
5
g 1
Mean (n = 1,294) h i
abundance
Relative
0 0 0
0 20 40 56 0 20 40 56 0 20 40 56
Time after HCT (d) Time after HCT (d) Time after HCT (d)
Akkermansiaceae Enterobacteriaceae Lactobacillaceae Actinomycetaceae
Lachnospiraceae Bacteroidaceae Enterococcaceae Peptostreptococcacea Streptococcaceae
Enterococcaceae Bifidobacteriaceae Erysipelotrichaceae Ruminococcaceae Veillonellaceae
Ruminococcaceae
Clostridiaceae 1 Lachnospiraceae Staphylococcaceae Other families
Fig. 1 | Immune reconstitution and microbiome dynamics after HCT. receiving transplants from the same source. In a, coloured bars on the left
a–c, Major phases of HCT: immunoablation during conditioning before HCT on indicate the range of cell counts in healthy individuals. d–f, Loss of microbial
day 0 (I) is followed by post-HCT neutropenia (II) and reconstitution (III). Daily diversity during HCT, measured by 16S rRNA gene sequencing of faecal
mean counts (shaded area indicates s.d.) of neutrophils, lymphocytes and samples, supporting previous smaller studies23,24. In d, the line shows daily
monocytes from individuals receiving transplants between 2003 and 2019 (a), mean across patients, shaded area shows s.d. e, f, Data from individual patients.
compared with those from individuals (b, c) representative of the recovery g–i, Relative abundance of commensal bacteria families. g, Mean (± s.d.)
trajectories for different stem-cell graft sources. Patient 1 received a PBSC relative abundance across all patients. h, i, Relative abundance in individual
graft and patient 2 received umbilical-cord blood. Line with circles shows data patients.
from the patient; solid line and shaded region show mean ± s.d. for all patients
faecal microbiota transplantation (auto-FMT)—a controlled micro- blood de novo by differentiation of haematopoietic progenitor cells
biota manipulation experiment performed directly on our patients20 from the bone marrow, and can be mobilized from thymus and lymph
(Extended Data Fig. 2a). To investigate whether auto-FMT affected WBC nodes (lymphocytes), and spleen, liver and lungs (neutrophils); WBCs
reconstitution, we compared the neutrophil, lymphocyte and mono- can also migrate from the blood to other tissues when needed23. To iden-
cyte counts after neutrophil engraftment in 24 individuals (engraftment tify modulators of these dynamic processes, we developed a two-stage
defined as 3 consecutive days with over 500 neutrophils per μl). FMTs approach analysing the changes of WBC counts between two days
were conducted at variable dates relative to neutrophil engraftment (Fig. 3a). Stage 1 served as a clinical- and metadata-feature-selection
(Fig. 2a, Supplementary Table 2). Overall, we observed higher counts stage using blood and medication data of 1,096 patients without avail-
of each WBC type in individuals who received an auto-FMT during the able microbiome information (Extended Data Fig. 1b shows data inclu-
first 100 days after neutrophil engraftment (P < 0.001, Fig. 2b, c; total sion). Stage 2 was performed on data from an independent cohort of 841
WBCs, Extended Data Fig. 2b–g). different patients from whom concurrent microbiome samples were
The higher WBCs in individuals receiving auto-FMT could result from available to detect associations between microbiome and peripheral
the successful reconstitution of a complex microbiota20 and associated immune cell dynamics.
metabolic capabilities21, or they could be a systemic response to a severe In stage 1, we analysed the changes in neutrophils, lymphocytes
therapy that introduced billions of intestinal organisms at once via an and monocytes during recovery from more than 20,000 pairs of
enema (no enema was administered to controls20). Moreover, chance post-engraftment blood samples separated by a single day (Fig. 3b).
differences in extrinsic factors such as different immunomodulator A cross-validated feature-selection approach detected medications
medications may have affected this result owing to the small cohort and HCT parameters associated with WBC dynamics (Extended Data
size. Nonetheless, the results supported the notion that the microbiota Fig. 3a–c, Supplementary Table 3). In stage 2, we sought to identify the
can modulate the peripheral immune system. High lymphocyte counts additional contribution of the gut microbiome. We performed Bayes-
during immune reconstitution have been associated with improved ian inferences using data from different sets of patients with available
clinical outcomes22, and 3-year survival was positively associated microbiome samples (Supplementary Table 4). Stage 1 had identified—
with higher mean levels of WBCs during the 100 days after neutro- as expected—that the sources of stem-cell grafts are associated with
phil engraftment in the individuals receiving HCT (hazard ratio = 0.91, immune reconstitution kinetics (for example, umbilical-cord blood
P = 0.04). Identifying the taxa that modulate immune dynamics could is associated with slower kinetics than peripheral blood24 (peripheral
therefore open new ways to improve immune reconstitution—critical blood stem cells (PBSCs)), and we therefore stratified our patients
for clinical outcomes. by graft source in stage 2. The dynamic systems model of stage 2
To investigate the links between the gut microbiota and the dynam- thus included bacterial genera as predictors of daily changes in WBC
ics of WBC recovery, we turned to our large observational cohort of counts, in addition to the medications selected in stage 1, clinical fea-
patients receiving HCT. Homeostasis of circulatory WBC counts is a tures (conditioning intensity, age and sex), and the current state of the
complex, dynamic process: WBCs are formed and released into the blood in the form of counts of neutrophils, lymphocytes, monocytes,
(×1,000 μl–1)
(×1,000 μl–1)
samples
No. of
log(Wt+1) – log(Wt ) 500
PC 2 (EV: 23%)
immunomodulatory medications, clinical metadata and the state of the
W: 0
neutrophils 300 microbiome. b, Visualization of the WBC dynamics data. Scatter plot of the
recipients
lymphocytes
No. of
monocytes 150 principal components (PC) of observed daily changes of neutrophils,
0 lymphocytes and monocytes without (grey; n = 20,751 (after neutrophil
PBSC
BM
TCD
cord
Microbiome engraftment), 81,253 (total)) and with (orange; n = 2,615 (after neutrophil
PC 1 (EV: 48%) Graft type
engraftment), 6,297 (total)) available concurrent microbiota samples. EV,
d V-score V-score V-score e V-score V-score V-score
explained variance. c, Recipients of PBSCs (N = 312) provided most paired
tes ** ** **
cy blood dynamics and microbiota samples (n = 995). Datasets from recipients of
GCSF *** *** *** no ils
Mo oph stem cells from TCD, bone marrow (BM) and umbilical cord (cord) grafts were
MM sin ils **
* * Eo oph used for validation. d–f, Bayesian inference results from PBSC graft recipient
utr tes ** * **
Cetirizine Ne ocy
h
mp ele
ts * data. d, Posterior coefficient distributions of associations between treatments
Ly Plat
–1 0 1 –1 0 1 –1 0 1 –0.3 0.0 0.5 –0.3 0 0.25 –0.3 0 0.25 (colour shows more than 95% posterior density (PD) of coefficient estimates
Effect on: ΔNeutrophils ΔLymphocytes ΔMonocytes ΔNeutrophils ΔLymphocytes ΔMonocytes
Posteriors
greater than zero (red) or less than zero (blue)). MM, mycophenolate mofetil.
e, WBC counts. f, Relative abundances of microbial genera and daily changes
50%
Rothia
Faecalitalea
Ruminococcus 2 and Akkermansia that we associated with increased
–0.1
Effect on:
0.0 0.1
ΔNeutrophils
–0.1 0.0 0.1
ΔLymphocytes
–0.1 0.0
ΔMonocytes
0.1
WBC rates were also among those best reconstituted by auto-FMT20,
g h i potentially explaining the higher counts of neutrophils, monocytes
Faecalibacterium
Ruminococcus 2 25 + GCSF
4
– GCSF and lymphocytes in auto-FMT-treated individuals.
Simulated neutrophil count
Akkermansia
0.6 20 Our analyses show that the gut microbiome is associated with
Probability (%)
Relative abundance
15
100 highest 100 lowest 2 be interpreted as net effects, since they do not, for example, distin-
10
>15 d guish the effect of the microbiota on de novo haematopoiesis from its
0.1
5 effect on other sources and sinks. Unlike the plausible role of obligate
0 0 anaerobe fermenters in augmenting haematopoiesis via nutritional
Sample index Sample index 0 1 2 3 0 5 10 15
Simulated days Time to 2,000 support21, the positive association detected between Staphylococcus
neutrophils per μl (d)
and lymphocyte dynamics could instead result from reduced extrava-
sation of T cells from circulation into the gut epithelium40, especially
since high abundances of Staphylococcus are associated with low gut
direction, we saw a positive association of lymphocyte counts with [R.] microbiota diversity (P < 0.001, Extended Data Fig. 9a), which indicates
gnavus group growth rates. Ruminococcus gnavus is associated with a depleted microbiota.
inflammatory bowel diseases31 and autoimmune disorders10; our analy- Nevertheless, our approach enables us to leverage the chronology
sis suggests that it may drive high neutrophil-to-lymphocyte ratios that of events and assess ‘mathematical causality’41. Owing to the observa-
are broadly characteristic of poor disease outcomes in inflammatory tional nature of these data, there are risks of confounding, for exam-
bowel diseases32 and other conditions33,34. ple, from undetected infections or dietary components, which could
Several of the bacterial taxa positively associated with WBC rates explain some of the associations, but the close temporal correspond-
were obligate anaerobes, some of which produce cell-wall molecules1,35 ence41 between microbiota and WBC dynamics reduces the number of
and short-chain fatty acids36 that modulate immune responses and plausible confounders. Notwithstanding potential confounders, our
granulopoiesis37. Ruminococcus 2, for example, contains keystone results suggest candidate bacterial taxa that might improve immune
species that release nutrients from complex dietary starch38, and such reconstitution, and focused follow-up studies are required to evaluate
nutritional support from the microbiota improved haematopoietic their immunomodulatory efficacy. Members of Faecalibacterium,
reconstitution in mice21. To identify a similar association in our patients, Ruminococcus12 in one study, and Akkermansia11 in another have been
we estimated the microbiota reconstitution potency of each sample associated with better responses to anti–PD-1 immunotherapy, which
(Methods). Shotgun metagenomic sequences from 124 of our sam- suggested a disagreement regarding involved taxa42. Our results, how-
ples revealed that samples with positive microbiota potency were ever, identified Faecalibacterium, Ruminococcus 2 and Akkermansia as
enriched in cholate degradation, vitamin B1 synthesis and butanoate the taxa with the strongest associations with immune cell dynamics,
formation pathways (Extended Data Fig. 8). In line with evolutionary agreeing with the findings of both these previous studies that these
theory39, our results suggest that such broadly available microbial taxa are associated with human immune modulation. Furthermore,
traits may be co-opted by the host as part of the homeostatic interplay our work enables us to directly compare their inferred effect sizes with
between immune system and microbiota. The genera Faecalibacterium, the effects of immunomodulatory drugs. These genera are common in
P
Δ ln(W ) Pˆ
Δt
= gr + ∑ βj Xj μ = gr + ∑ xj βj
j =1
j =1
Extended Data Fig. 1 | Blood cell counts over time. a, WBC counts and platelet counts per graft source over the first 100 days post HCT per day relative to HCT
from N = 2,235 adult patients (detailed demographics in supplementary Table 1); lines: mean, shaded: ± standard deviations. b, Data exclusion diagram.
Extended Data Fig. 2 | FMT increases WBC counts. a, HCT patient who estimates (mean vs. mean + FMT effect) from linear mixed effects models of
received an autologous faecal microbiota transplant (auto-FMT, dashed red total WBC counts over time indicate an auto-FMT-induced increase of WBCs
line) that restored commensal microbial families and ecological diversity in (βFMT: P = 7 × 10 −14). e–g, Respectively: neutrophil, lymphocyte and monocyte
the gut microbiota, with concurrent cell counts of peripheral neutrophils, count trajectories of 24 FMT trial patients. Thin lines: raw data (blue:
lymphocytes and monocytes and immunomodulatory drug administrations. post-FMT); thick black: mean per day, thick blue: mean+post-FMT coefficient.
b, Total WBC counts in 24 enrolled patients (10 control, 14 treated) Means and confidence intervals (shaded region) without (black) and after FMT
post-neutrophil engraftment; vertical lines indicate randomization dates. (blue), as well as the coefficient estimate for FMT treatment and its P value from
c, Weekly mean WBC counts aligned to the randomization date (FMT-treated: a linear mixed effects model relating cell counts over time to the FMT
red, control: black). Line: mean per week, shaded region: 95% CI. d, Coefficient treatment (Methods).
Article
Extended Data Fig. 3 | Results of the feature selection stage 1 regression. gr: intercept; TCD: T cell depleted graft (ex-vivo) by CD34+ selection; PBSC:
a–c, Stage 1 regression on neutrophil, lymphocyte, and monocyte dynamics, peripheral blood stem cells; BM: bone marrow; cord: umbilical cord blood;
respectively, on patients without microbiome data. Coefficients from NONABL: Nonmyeloablative; REDUCE: reduced-intensity conditioning
tenfold cross-validated elastic net regression daily changes in neutrophils. regimen; F: female; N: patients, n: samples (daily changes in neutrophils).
Extended Data Fig. 4 | Additional coefficients, posterior convergence coefficients from individual univariate regressions of microbiome and clinical
evaluation and validation. a–c, Additional posterior coefficient estimates of predictors with changes in neutrophils, lymphocytes and monocyte, and for
medications, additional genera and HCT metadata from the Bayesian stage 2 comparison the corresponding coefficients signs from the Bayesian multiple
regression, see also Fig. 3. REDUCE: reduced-intensity conditioning regimen; linear regressions in stage 2 of the analysis of WBC dynamics in MSK patients
NONABL: non-myeloablative conditioning regimen. F: female. d–f, posterior (Fig. 3). Pvalues were adjusted for multiple hypothesis testing using Bonferroni
sampling convergence. Histograms of the ranked posterior draws from the correction: ***P < 0.001, **P < 0.01, *P < 0.05; P > 0.05: n.s. Sign of coefficients
model of neutrophil, lymphocyte and monocyte dynamics, respectively, in from MSK PBSC patients for comparison. j, Equivalent validation analysis from
PBSC patients (ranked over all chains), plotted separately for each chain show patients treated at Duke using partial least squares regression of microbiome
no substantial differences between chains. g–i, Predictors of WBC dynamics and clinical predictors identified in stage 2 of our analysis on daily changes in
using data from patients treated at Duke. Heatmaps indicate the slope neutrophils, lymphocytes and monocyte.
Article
Extended Data Fig. 5 | Validation using absolute instead of relative that is, neutrophil, lymphocyte and monocyte daily log-changes, was
abundance bacterial genus data. a–d, Validation analysis of the main model conducted, and coefficients for medications (a), WBC feedbacks (b)
using absolute bacterial abundances as predictors instead of relative metadata (c) and total genus abundances (d) are shown. This was only possible
abundances in Fig. 3. Results show inferred coefficients and P values from for only a subset of the data used in the main analysis for which we obtained
multiple linear regressions. One regression per analysed WBC type dynamics, absolute bacterial abundance estimates (Methods), n: samples, N: patients.
Extended Data Fig. 6 | Jointly inferred association network between WBC and bacterial genus dynamics. Strong regularization yields few non-zero
coefficients and antibiotics dominate the dynamics.
Article
Extended Data Fig. 7 | Jointly inferred association network between WBC for example, between lymphocytes and [Ruminococcus] gnavus group
and bacterial genus dynamics with reduced regularization. Reducing (highlighter green boxes, and cartoon).
regularization strength (Methods) indicates potential bidirectional feedbacks,
Extended Data Fig. 8 | Functional analysis of microbiota samples. To samples that distinguished positive and negative potency samples the most
distinguish samples predicted to increase rates of WBCs, a microbiota potency (LDA-score magnitude in the 95th percentile). Highlighted pathways are
score was calculated from posterior coefficients (Fig. 3, Methods) and the discussed in the main text. For each pathway, we tested whether pathway
relative abundance of taxa in samples. Bars show linear discriminant analysis presence was enriched (depleted) in positive (negative) potency samples using
(LDA) scores of MetaCyc pathway profiles from 124 shotgun sequenced one-sided Fisher’s exact test; ***P < 0.001, **P < 0.01, *P < 0.05.
Article
Extended Data Fig. 9 | Abundance profiles of bacterial genera across Staphylococcus abundances). b, Abundance profiles of the two genera,
analysed samples. a, The relative non-zero abundance of Staphylococcus is Faecalibacterium and Ruminococcus 2, most strongly associated with WBC
inversely related to microbiome alpha diversity, bold line: regression line from increase; number of times detected (left) and log10 abundance distribution
a linear model of the mean of the log10 Staphylococcus relative abundance, when above detection (right).
shaded: 95% confidence intervals (n = 1,381 samples with non-zero
Extended Data Fig. 10 | Survival analysis and confirmation of model results prior for σ in the main Bayesian model. Plotted are the posterior means from
with different priors. a, Kaplan–Meier plot of patient 3-year survival with our main analysis against the equivalent inference with an inverse Gamma prior
sufficient available blood data (Supplementary Information, Extended Data (alpha = 1, beta = 1).
Fig. 1). b, posterior association coefficients do not depend on the choice of
nature research | reporting summary
Corresponding author(s): Jonas Schluter, Joao B. Xavier
Last updated by author(s): Aug 8, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis python3.7.0, R3.6.1, DADA2, Humann2, ChocoPhlAn, MetaCyc, PyMC3, sklearn-0.23.2, https://github.com/jsevo/wbcdynamics_microbiome/
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
April 2020
The data used in our study is organized in Excel compatible comma separated value files as supplementary tables (data-tables.zip). All sequencing data have been
made available publicly, and the NCBI SRA accession numbers are listed in the tables below:
1. cGENUS.csv: relative taxon abundances in fecal microbiota samples from 12,633 stool samples
2. cHCTMETA.csv: HCT characteristics
3. cINFECTIONS.csv: positive blood culture results
4. cMISAMPLES.csv: NCBI SRA accession number, diversity (inverse Simpson index), total 16S (where available), stool consistency for each fecal microbiota sample
5. cMED.csv: medication data
1
6. cPIDMETA.csv: anonymized patient demographics
7. cWBC.csv: absolute counts of neutrophils, lymphocytes, monocytes, eosinophils, and platelets with indication if included in analyses
Metadata and processed sequencing data are made available on a public repository via Figshare:
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions Non-adults, non-engrafted patients were excluded. Data from patients without valid two-day apart sample pairs were excluded.The
supplementary methods provides a detailed flow chart of data inclusion/exclusion.
Replication All analyses can be reproduced with the code and data provided. No experiments were conducted.
Randomization N/A as no trial was conducted for this study. The randomized data was previously published and the randomization procedure is explained in
the relevant reference (Taur et al. 2018)
Recruitment No patients were specifically recruited for this work. Allo-HCT patients since 2003 were considered and included or excluded
as detailed in the data inclusion/exclusion section.
Ethics oversight The participants in the auto-FMT trial (NCT02269150) provided written informed consent to participate in the trial (#14-025).
2
Ethics oversight Participants in the observational cohorts at both Memorial Sloan Kettering Cancer Center and at Duke University School of
Medicine provided written informed consent for the use of their fecal specimens and clinical data. The use and analysis of
Clinical data
Policy information about clinical studies
All manuscripts should comply with the ICMJE guidelines for publication of clinical research and a completed CONSORT checklist must be included with all submissions.
Data collection This is a randomized, open-label, controlled study designed to assess the efficacy of autologous fecal microbiota transplantation
(auto-FMT) for prevention of Clostridium difficile infection (CDI) in patients who have undergone allogeneic hematopoietic stem cell
transplantation (allo-HSCT). Patients will be enrolled prior to allo-HSCT; feces will be collected and stored from all participating
subjects prior to the initiation of conditioning regimens, analyzed by deep 16S rRNA gene sequencing, and tested by assay for
intestinal pathogens including Clostridium difficile. Later in the course of transplantation, following engraftment (defined as the first
day of three consecutive days, that the absolute blood neutrophil count is at above f 500 mm3), subjects will undergo fecal testing
for presence of Bacteroidetes by 16S PCR. Subjects will be eligible for study if they have a microbiologically diverse pre-transplant
colonic microbiota, and if the post-engraftment specimen contains Bacteroidetes at a prevalence equal to or below (0.1%)
Outcomes Primary Outcome: Clostridium difficile infection (CDI) [ Time Frame: up to 1 year following randomization ]
CDI is defined as diarrheal stool (unformed stool conforming to the shape of a specimen container), and a positive test for toxin-
producing C. difficile (either by toxin B gene PCR or cytotoxicity assay).
April 2020
3
Article
There is increasing evidence that coronavirus disease 2019 (COVID-19) produces more
severe symptoms and higher mortality among men than among women1–5. However,
whether immune responses against severe acute respiratory syndrome coronavirus
(SARS-CoV-2) differ between sexes, and whether such differences correlate with the
sex difference in the disease course of COVID-19, is currently unknown. Here we
examined sex differences in viral loads, SARS-CoV-2-specific antibody titres, plasma
cytokines and blood-cell phenotyping in patients with moderate COVID-19 who had
not received immunomodulatory medications. Male patients had higher plasma
levels of innate immune cytokines such as IL-8 and IL-18 along with more robust
induction of non-classical monocytes. By contrast, female patients had more robust
T cell activation than male patients during SARS-CoV-2 infection. Notably, we found
that a poor T cell response negatively correlated with patients’ age and was associated
with worse disease outcome in male patients, but not in female patients. By contrast,
higher levels of innate immune cytokines were associated with worse disease
progression in female patients, but not in male patients. These findings provide a
possible explanation for the observed sex biases in COVID-19, and provide an
important basis for the development of a sex-based approach to the treatment and
care of male and female patients with COVID-19.
SARS-CoV-2 is the novel coronavirus first detected in Wuhan, China, in a more robust immune response to vaccines14. These findings collectively
November 2019 that causes COVID-196. On 11 March 2020, the World suggest a more robust ability among women to control infectious agents.
Health Organization (WHO) declared COVID-19 a pandemic7. A growing However, the mechanism by which SARS-CoV-2 causes more severe dis-
body of evidence reveals that male sex is a risk factor for a more severe dis- ease in male patients than in female patients remains unknown.
ease, including death. Globally, approximately 60% of deaths from COVID- To determine the immune responses against SARS-CoV-2 infection in
19 are reported in men5, and a cohort study of 17 million adults in England male and female patients, we performed detailed analyses on the sex dif-
reported a strong association between male sex and the risk of death ferences in immune phenotypes by the assessment of viral loads, levels
from COVID-19 (hazard ratio 1.59, 95% confidence interval 1.53–1.65)8. of SARS-CoV-2-specific antibodies, plasma cytokines or chemokines,
Past studies have shown that sex has a considerable effect on the out- and blood-cell phenotypes.
come of infections and has been associated with underlying differences
in immune responses to infection9,10. For example, the prevalence of
hepatitis A and tuberculosis are notably higher in men that in women11. Overview of the study design
Viral loads are consistently higher in male patients with hepatitis C virus Patients who were admitted to the Yale-New Haven Hospital between
and human immunodeficiency virus (HIV)12,13. By contrast, women mount 18 March and 9 May 2020 and were confirmed positive for SARS-CoV-2
1
Department of Immunobiology, Yale University School of Medicine, New Haven, CT, USA. 2Department of Epidemiology of Microbial Diseases, Yale School of Public Health, New Haven, CT,
USA. 3Department of Medicine, Section of Infectious Diseases, Yale University School of Medicine, New Haven, CT, USA. 4Department of Biomedical Engineering, Yale School of Engineering &
Applied Science, New Haven, CT, USA. 5Boyer Center for Molecular Medicine, Department of Microbial Pathogenesis, Yale University, New Haven, CT, USA. 6Department of Comparative
Medicine, Yale University School of Medicine, New Haven, CT, USA. 7Department of Ecology and Evolutionary Biology, Yale University, New Haven, CT, USA. 8Department of Medicine, Section of
Pulmonary and Critical Care Medicine, Yale University School of Medicine, New Haven, CT, USA. 9Department of Laboratory Medicine, Yale University School of Medicine, New Haven, CT, USA.
10
Center for Outcomes Research and Evaluation, Yale-New Haven Hospital, New Haven, CT, USA. 11Yale Institute for Global Health, Yale University, New Haven, CT, USA. 12Yale School of Nursing,
Yale University, Orange, CT, USA. 13Howard Hughes Medical Institute, Chevy Chase, MD, USA. 21These authors contributed equally: Takehiro Takahashi, Mallory K. Ellingson, Patrick Wong,
Benjamin Israelow, Carolina Lucas, Jon Klein, Julio Silva, Tianyang Mao. *A list of authors and their affiliations appears at the end of the paper. ✉e-mail: akiko.iwasaki@yale.edu
log10[SARS-CoV-2
8 8 2.0 P = 0.0034
nasopharyngeal swabs, saliva, urine and stool samples were collected 1.5
(copies ml–1)]
A450 nm
6 6 1.5
at study enrolment (baseline denotes the first time point) and longitu- 1.0
4 4 1.0
dinally on average every 3 to 7 days (serial time points). The detailed 0.5
2 2 0.5
demographics and clinical characteristics of these 98 participants are
0 0 0.0 0.0
shown in Extended Data Table 1. Plasma and peripheral blood mono-
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
_P
_P
C
C
C
C
nuclear cells (PBMCs) were isolated from whole blood, and plasma was
F_
F_
F_
F_
M
M
_H
_H
H
M
M
used for titre measurements of SARS-CoV-2 spike S1 protein-specific IgG c
P = 0.082 P = 0.0011 P = 0.024 P = 0.0005 P = 0.027
and IgM antibodies (anti-S1-IgG and -IgM) and cytokine or chemokine 400 300 250
F_ W
W
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
and longitudinal, as described below. As a control group, healthcare
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
M
M
_H
_H
_H
H
workers (HCWs) from Yale-New Haven Hospital were enrolled who were
M
P = 0.018 P = 0.0029 P = 0.0015
uninfected with COVID-19. Demographics and background information 8,000 400 40,000
P = 0.0004 P = 0.027 P= 0.0003
for the HCW group and the demographics of HCWs for cytokine assays
ml–1)
6,000 300 30,000
and flow cytometry assays for the primary analyses are in Extended
CCL8 (pg
4,000 200 20,000
Data Table 1. Demographic data, time-point information of the samples
defined by the days from the symptom onset (DFSO) in each patient, 2,000 100 10,000
treatment information, and raw data used to generate figures and tables 0 0 0
is in Supplementary Table 1.
F_ W
W
t
Pt
W
W
t
Pt
W
W
t
Pt
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
M
M
_H
_H
_H
H
F_
F_
M
M
Baseline analysis
Fig. 1 | Comparison of viral RNA concentrations, titres of anti-SARS-CoV-2
The baseline analysis was performed on samples from the first time antibodies, and plasma cytokines and chemokine levels at the first
point of patients who met the following criteria: not in intensive care sampling of cohort A patients. a, Comparison of viral RNA measured from
unit (ICU), had not received tocilizumab, and had not received high nasopharyngeal (Np) swab and saliva. n = 14 for male and female patients
doses of corticosteroids (prednisone equivalent of more than 40 mg) (M_Pt and F_Pt, respectively) for nasopharyngeal samples, and n = 9 and 12,
before the first sample collection date. This patient group, cohort respectively, for saliva samples. Dotted lines indicate the detection limit of the
A, consisted of 39 patients (17 male and 22 female) (Extended Data assay (5,610 copies ml−1), and negatively tested data are shown on the x axis. ND,
Tables 1, 2). Intersex and transgender individuals were not repre- not detected. b, Titres of specific IgG and IgM antibodies against SARS-CoV-2
sented in this study. Figures 1–4 represent analyses of baseline raw S1 protein were measured. n = 13, 74, 15 and 20 for IgG, and n = 3, 18, 15 and
values obtained from patients in cohort A. In cohort A patients, male 20 for IgM, for male HCW (M_HCW), female HCW (F_HCW), M_Pt and F_Pt,
and female patients were matched in terms of age, body mass index respectively. The cut-off values for positivity are shown by the dotted lines.
c, Comparison of the plasma levels of representative innate immune cytokines
(BMI), and DFSO at the first time point sample collection (Extended
and chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and F_Pt,
Data Fig. 1a). However, there were significant differences in age and
respectively. P values were determined by unpaired two-tailed t-test (a) or
BMI between HCW controls and patients (patients had higher age
one-way analysis of variance (ANOVA) with Bonferroni multiple comparison
and BMI values) (Extended Data Table 1), and therefore an age- and test (b, c). All P values < 0.10 are shown. Data are mean ± s.e.m. The results of all
BMI-adjusted difference-in-differences analysis was also performed the cytokines or chemokines measured can be found in Extended Data Fig. 1b.
in parallel (Extended Data Table 3).
Longitudinal analysis difference over time in immune responses between male and female
As parallel secondary analyses, we performed longitudinal analy- patients with COVID-19 and male and female HCWs (Extended Data
sis on a total patient cohort (cohort B) to evaluate the difference in Table 5) were calculated.
immune response over the course of the disease between male and
female patients. Cohort B included all patient samples from cohort
A (including several time-point samples from the cohort A patients) Sex differences in cytokines and chemokines
as well as an additional 59 patients who did not meet the inclusion We first compared the concentrations of viral RNA of male and female
criteria for cohort A. Because cohort B included more severely affected patients. For both cohorts A and B, there was no difference by sex in
patients in ICU, the average clinical scores were higher in cohort B than terms of the viral RNA concentrations in nasopharyngeal swab and
in cohort A (mean ± s.d.: 1.3 ± 0.5 (female) and 1.4 ± 0.5 (male) for cohort saliva (Fig. 1a, Extended Data Tables 3, 4).
A, and 2.5 ± 1.5 (female) and 2.7 ± 1.3 (male) for cohort B) (Extended Anti-SARS-CoV-2 S1-specific IgG and IgM (anti-S1-IgG and -IgM)
Data Table 1). This analysis included several time-point samples from antibodies were comparable in infected male and female in cohort A
98 participants in total. Data from cohort B were analysed for sex dif- (Fig. 1b) and in cohort B (Extended Data Tables 4, 5). Thus, at baseline
ferences in immune responses among patients using longitudinal and during the course of the disease, there were no clear differences
analysis, controlling for potential confounding by age, BMI, receipt in the amount of IgG or IgM generated against the S1 protein between
of immunomodulatory treatment (tocilizumab or corticosteroids), male and female patients.
DFSO and ICU status. Second, we conducted a longitudinal analysis that Next, we analysed the levels of 71 cytokines and chemokines in the
compared male and female patients with COVID-19 to male and female plasma. Levels of many pro-inflammatory cytokines, chemokines and
HCWs, controlling for age and BMI. Adjusted least square means differ- growth factors, including IL-1β, IL-6, IL-8, TNF, CCL2, CXCL10 and G-CSF,
ence over time in immune responses between male and female patients are increased in the plasma of patients with COVID-1916. In line with
with COVID-19 (Extended Data Table 4) and adjusted least square means previous reports, levels of inflammatory cytokine or chemokine were
(CD14+
CD16+)
15
ncMono Fig. 1b, Extended Data Table 3). However, levels of IL-8 and IL-18 were
10 (CD14dim Male
CD16+) significantly higher in male patients than in female patients in cohort
t
Pt
male patients than in to female patients, IL-8 and CXCL10 were sig-
_P
C
C
F_
105
M
_H
P < 0.0001
104
nificantly increased in male patients compared to male HCWs than in
100 P = 0.0003
female patients compared to female HCWs (difference-in-differences,
T cells (% of live)
80 Female
103
60
Extended Data Table 3). In adjusted analyses of cohort B, although we
40
0
59.7 25.8
did not see significant sex differences in the levels of IL-8 and IL-18, we
-103
20
-103 0 103 104 105 found significantly higher levels of CCL5 in male patients than in female
0
CD14-PE-Cy7
patients over the course of the disease (Extended Data Table 4) and
F_ W
W
t
Pt
F_
M
_H
H
M
P= 0.019
intMono (% of live)
ncMono (% of live)
cMono (% of live)
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
_P
_P
C
C
C
C
C
C
F_
F_
F_
F_
M
M
_H
_H
_H
_H
H
M
d e
R = –0.210, P = 0.387 (F)
F Monocyte differences by sex
50 25 M (low-int)
≥90
80 40 20
R = 0.635, P = 0.011 (M) M (high) Next, we examined the immune cell phenotype by flow cytometry. Freshly
BMI (kg m–2)
70
Age (years)
60 30 15
50 15
40
30
20 10 B cells, natural killer T cells, natural killer cells, monocytes, macrophages
20 10 5
and dendritic cells to investigate the composition of PBMCs (Extended
ncMono (% of live)
10 10
0 0 0
Data Fig. 2b). Consistent with a previous report on a decrease in T cells
t
h
-in
-in
-in
ig
ig
ig
in patients16, in cohort A, the proportion of T cells in the live cells was sig-
H
H
w
w
Lo
Lo
Lo
80 P = 0.030 250 P = 0.050 8,000 P = 0.013 nificantly lower in patients, whereas the proportion of B cells was higher
in both male and female patients than in HCWs (Fig. 2a, Extended Data
ml–1)
60 200 6,000 0
150 Table 3). There was no difference in the numbers of B cells across all groups,
IL-18 (pg
40 4,000
100
20 2,000
but the numbers of T cells were lower in patients of both sexes (data not
50
0 0 0 2.8 3.2 3.6
shown). By contrast, in cohort B, we found that male patients had signifi-
log10[CCL5 (pg ml–1)] cantly lower numbers of T cells, both total counts and as a proportion of
h
h
t
t
ig
ig
ig
-in
-in
-in
H
H
w
live cells, over the course of the disease than female patients (Extended
Lo
Lo
Lo
Fig. 2 | Differences in composition of PBMCs between male and female Data Table 4). Next, we found higher populations of monocytes in both
patients in cohort A at the first sampling. a, Comparison on the proportion of sexes in cohort A (Fig. 2b, c, Extended Data Fig. 2b) compared to HCWs.
B cells (top) and T cells (bottom) in live PBMCs. n = 6, 42, 16 and 21 for M_HCW, F_ Although CD14+CD16− classical monocytes were comparable across all
HCW, M_Pt and F_Pt, respectively. b, Representative 2D plots for CD14 and CD16 groups, levels of CD14+CD16+ intermediate monocytes were increased
in monocytes gate (live/singlets/CD19−CD3−/CD56−CD66b−). Numbers in red in patients compared with HCWs, and this increase was more robust in
indicate the percentages of each population in the parent monocyte gate. female patients (Fig. 2b, c). By contrast, male patients had higher levels of
c, Comparison between percentages of total monocytes, classical monocytes CD14loCD16+ non-classical monocytes than controls and female patients
(cMono), intermediate monocytes (intMono) and non-classical monocytes (Fig. 2b, c). These differences were observed in age- and BMI-adjusted
(ncMono) in the live PBMCs. n = 6, 42, 16 and 21 for M_HCW, F_HCW, M_Pt and analyses, too, but were not significant (Extended Data Table 3).
F_Pt, respectively. d, Comparison of age, BMI, DFSO, T cells (percentage of live
We then divided the 17 cohort A male patients into two groups, namely,
PBMCs) and plasma IL-18 and CCL5 levels between male patients who had high
a ‘high’ group who had high percentages of non-classical monocytes
non-classical monocytes and low-intermediate non-classical monocytes. n = 13
(upper quartile 4 patients, all had more than 5% of non-classical mono-
and 4 for ‘low-int’ and ‘high’ group, respectively, for age, BMI and DFSO. n = 12
and 4 for ‘low-int’ and ‘high’ group, respectively, for T cells and IL-18 or CCL5
cytes) and a ‘low-intermediate’ group (others, 13 patients). We compared
levels. e, Correlation between plasma CCL5 levels and non-classical monocytes age, BMI, DFSO, T cells, and plasma levels of IL-18 and CCL5. Although
(percentage of live cells). Pearson correlation coefficients (R) and P values we found no differences in age, BMI or DFSO (Fig. 2d), we noted that the
for each sex are shown. Lines represent linear regression lines and shading group with high levels of non-classical monocytes had significantly lower
represents 95% confidence intervals for each sex. ncMono-high male patients levels of T cells and higher levels of CCL5 in plasma (Fig. 2d). In addition,
(n = 4) are shown with orange open squares, and ncMono-low-int male patients we found a significant correlation between CCL5 levels and abundance
(n = 11) are shown with orange closed squares. n = 19 for female patients (purple in non-classical monocytes only in male patients (Fig. 2e). These findings
circles). Data are mean ± s.e.m. in a, c and d. P values were determined by one-way suggest that the progression from classical to non-classical monocytes
ANOVA with Bonferroni multiple comparison test (a, c) or unpaired two-tailed may be arrested at the intermediate stage in female patients, and that
t-test (d). All P values < 0.10 are shown. increased innate inflammatory cytokines and chemokines are associated
with more robust activation of innate immune cells at the baseline as well
generally higher in patients than in controls (Fig. 1c, Extended Data as more robust longitudinal T cell decrease in male patients.
Figs. 1b, 2a, Extended Data Table 3). The levels of type-I, -II or -III inter-
feron (IFN) were comparable between the sexes in cohort A (Extended
Data Fig. 1b, Extended Data Table 3). However, we found higher levels Higher T cell activation in female
of IFNα2 in female patients than in male patients in cohort B (Extended We further examined the T cell phenotype in patients with COVID-19.
Data Table 4). The levels of many cytokines, chemokines and growth The composition of overall CD4-positive and CD8-positive cells among
Percentage of CD3
80 60
60 0.85 0.83 0.42 1.41
40 Male
40
20
HLA-DR-FITC
20
0 0
105
F_ W
W
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
H
104
M
0.78 13.7 0.90 27.9
c CD38+HLA-DR+ CD4 CD38+HLA-DR+ CD8 103 Female
P < 0.0001
0
10 P = 0.0002 8 P = 0.036
Percentage of CD3
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
H
Male
M
PD-1-PE
6 8
Percentage of CD3
105
6 1.43 10.8 2.67 25.9
4 104
4 Female
103
2
2 0
–103
0 0
–103 0 103 104 105
W
W
t
Pt
F_ W
W
t
Pt
_P
_P
C
C
C
C
F_
F_
M
M
_H
_H
Fig. 3 | Sex difference in T cell phenotype at the first sampling of cohort TIM-3 in the CD4 and CD8 T cells. Numbers in red indicate the percentages of
A patients. a, Percentages of CD4 and CD8 in the CD3-positive cells. PD-1+TIM-3+ populations in the parent gate (live/singlets/CD3+/CD4+ or CD8+/
b, Representative 2D plots for CD38 and HLA-DR in the CD4 and CD8 T cells. CD45RA−). e, Percentages of PD-1+TIM-3+ CD4 or CD8 cells in CD3-positive cells.
Numbers in red indicate the percentages of CD38+HLA-DR+ populations in the n = 6, 45, 16 and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. P values
parent gate (live/singlets/CD3+/CD4+ or CD8+). c, Percentages of CD38+HLA-DR+ were determined by one-way ANOVA with Bonferroni multiple comparison
CD4 or CD8 cells in CD3-positive cells. d, Representative 2D plots for PD-1 and test. Data are mean ± s.e.m. All P values < 0.10 are shown.
T cells were similar between all groups in cohort A (Fig. 3a, Extended Data A, 6 patients of each sex deteriorated during the course of the disease
Fig. 2c, Extended Data Table 3). Detailed phenotyping of T cells for naive (35.3% and 27.3%, respectively), and the intervals between the dates
T cells, central or effector memory T (TCM/TEM) cells, follicular helper at which the patients reached Cmax (DFSO at Cmax) and the first sample
T (TFH) cells, regulatory T (Treg) cells revealed no remarkable differences collection (DFSO at C1) were not significantly different between dete-
in the frequency of these subsets between sexes (Extended Data Fig. 2c). riorated male and female patients (mean ± s.d. = 3.7 ± 4.1 and 4.2 ± 2.7,
However, we observed higher levels of CD38 and HLA-DR-positive acti- respectively; P = 0.81 by unpaired two-tailed t-test).
vated T cells in female patients than in male patients (Fig. 3b, c). In paral- We first examined age, BMI, viral loads and titres of anti-S1-IgG
lel, PD-1- and TIM-3-positive terminally differentiated T cells were more antibodies between the stabilized and deteriorated groups in a
prevalent among female patients than male patients (Fig. 3d, e). These sex-aggregated manner. We found that the deteriorated group had on
findings were seen in both CD4 and CD8 T cells, but the differences average a higher BMI than the stabilized group. Although the age was not
were more robust in CD8 T cells (Fig. 3c, e, Extended Data Table 3). We statistically different, the stabilized group spanned a larger age range
also stained for intracellular cytokines such as IFNγ, granzyme B (GzB), than the deteriorated group, who were generally of a more advanced age.
TNF, IL-6 and IL-2 in CD8 T cells, and IFNγ, TNF, IL-17A, IL-6 and IL-2 in The viral load and antibody titres were comparable (Fig. 4a). Next, we
CD4 T cells. Levels of these cytokines were higher in patients than in examined these factors in a sex-disaggregated manner, and found that
controls, and were generally comparable between sexes in the patients the deteriorated male (M_deteriorated) group was on average signifi-
(Extended Data Fig. 2d). Analyses of T cell phenotypes in cohort B did cantly older than the stabilized male (M_stabilized) group, whereas the
not reveal any significant differences between sexes (Extended Data two female groups (F_deteriorated and F_stabilized) were comparable
Tables 4 and 5). Therefore, female patients with COVID-19 had more in age (Fig. 4b). In addition, BMI was higher for the M_deteriorated than
abundant activated and terminally differentiated T cell populations the M_stabilized group, whereas there was no difference in BMI between
than male patients at baseline in unadjusted analyses. the F_deteriorated and F_stabilized groups (Fig. 4b). By contrast, the
F_deteriorated group had higher viral load in saliva than the F_stabilized
group, whereas there was no difference in the male groups (Fig. 4b).
Sex-dependent immunity and disease course The levels of antibodies were comparable between the deteriorated
We investigated whether certain immune phenotypes were correlated and stabilized groups both in male and female, but stabilized female
with disease trajectory, and whether these phenotypes and factors dif- tended to have higher antibody levels (Fig. 4b).
fered between the sexes. To this end, we evaluated the disease course We further investigated whether the key factors identified in the
of patients in cohort A. The clinical scores at the first sample collection previous analyses correlated with disease progression in male and
(C1) were 1 or 2 for all of the patients in cohort A. The patients were cat- female patients. We observed that regardless of sex, some chemokines
egorized into a ‘deteriorated’ group if the patients marked a score of 3 or and growth factors, such as CXCL10 (also known as IP-10) and M-CSF,
higher after the first sample collection date as their maximum clinical were increased in patients that went on to develop worse disease. How-
scores during admission (Cmax). By contrast, if the patients maintained ever, there were some innate immune factors, such as CCL5, TNFSF10
the score of 1 or 2, they were categorized as ‘stabilized’ (Extended Data (also known as TRAIL) and IL-15, that were specifically increased only
Table 2). Both in male (n = 17) and female (n = 22) patients from cohort in female patients that subsequently progressed to worse disease,
log10[SARS-CoV-2
log10[SARS-CoV-2
≥90
80 40 8 8 2.0
(copies ml–1)]
(copies ml–1)]
BMI (kg m–2)
70
Age (years)
A450 nm
60 30 6 6 1.5
50
40 20 4 4 1.0
30
20 10 2 2 0.5
10
0 0 0 0 0.0
log10[SARS-CoV-2
log10[SARS-CoV-2
≥90
(copies ml–1)]
(copies ml–1)]
BMI (kg m–2)
80 40 8 8 2.0
Age (years)
70
A450 nm
60 30 6 6 1.5
50
40 20 4 4 1.0
30
20 10 2 2 0.5
10
0 0 0 0 0.0
0 0 0 0 0
d
P = 0.0026 P = 0.025 P = 0.094
10 8 6 8 20
CD38+HLA-DR+ CD4
CD38+HLA-DR+ CD8
PD-1+TIM-3+ CD4
PD-1+TIM-3+ CD8
8 6 6 15
(% of CD3)
(% of CD3)
(% of CD3)
(% of CD3)
(% of CD3)
IFNγ+ CD8
4
6
4 4 10
4
2
2 2 2 5
0 0 0 0 0
CD38 HLA DR CD8
e f
PD 1TIM3 CD8
PD 1TIM3 CD8
Deterioration
Anti-S1-IgG
Anti-S1-IgG
Cmax – C1
IFNγ CD8
TNFSF10
CXCL10
CXCL10
Np load
Np load
Sex 15
MCSF
CD38+HLA-DR+CD8 (% of CD3)
MCSF
CCL5
CCL5
IL-15
F
IL-15
BMI
Age
BMI
Age
M
1 1 6
Fig. 4 | Differential immune phenotypes at the first sampling and disease are shown. n = 10, 6, 16 and 6 for M_stabilized, M_deteriorated, F_stabilized and
progression between sexes in cohort A patients. a, b, Sex-aggregated (a) and F_deteriorated group, respectively. e, Pearson correlation heat maps of the
sex-disaggregated (b) comparison of age, BMI, RNA concentration in indicated parameters are shown for each sex. For viral RNA concentrations
nasopharyngeal swab and saliva, and anti-S1-IgG antibodies between the and cytokine or chemokine levels, log-transformed values were used for the
stabilized and deteriorated group. n = 11, 6, 16 and 6 for age and BMI, n = 9, 5, calculation of the correlations. The size and colour of the circles indicate
9 and 5 for nasopharyngeal swab, n = 6, 3, 8 and 4 for saliva, and n = 10, 5, 14 and the correlation coefficient (R), and only statistically significant correlations
6 for anti-S1-IgG antibodies, for M_stabilized, M_deteriorated, F_stabilized and (P < 0.05) are shown. Clinical deterioration from the first time point was scored
F_deteriorated group, respectively. Dotted lines in the viral concentration and by Cmax − C1. n = 17 and 22 for male and female, respectively. f, Correlation
anti-S1-IgG panels indicate the detection limit and cut-off value for positivity, between age and CD38+HLA-DR+ CD8 T cells (left) and IFNγ+CD8 T cells (right).
respectively. c, Cytokine or chemokine comparison between stabilized and Pearson correlation coefficient (R) and P values for each correlation and sex are
deteriorated groups. n = 10, 6, 14 and 5 for the M_stabilized, M_deteriorated, shown. Lines represent linear regression lines and shading represents 95%
F_stabilized and F_deteriorated groups, respectively. d, Comparisons in the confidence intervals for each sex. P values were determined by unpaired
proportions of activated (CD38+HLA-DR+) and terminally differentiated two-tailed t-test in a–d. Data are mean ± s.e.m. All P values < 0.10 are shown.
(PD-1+TIM-3+) CD4 or CD8 T cells, and IFNγ+CD8 T cells in CD3-positive T cells
but this difference was not observed in male patients (Fig. 4c). In the We finally examined the correlations between age, BMI, viral loads,
age- and DFSO-adjusted analysis of cohort A, we also found that CCL5 anti-S1 antibodies, cytokines or chemokines, activated or terminally
was only increased in female patients that progressed to worse disease differentiated or IFNγ-producing CD8 T cells, and clinical disease
compared to the stabilized patients, but no such correlation was found course (‘Cmax − C1’ was used for the deterioration score). The corre-
in male patients (Extended Data Table 6). lation matrix showed that in female patients, higher levels of innate
T cell phenotypes in these groups showed that male patients immunity cytokines, such as TNFSF10 and IL-15, were positively cor-
whose disease worsened had a significantly lower proportion of acti- related with disease progression, whereas there was no association
vated T cells (CD38+HLA-DR+) and terminally differentiated T cells between CD8 T cell status and deterioration (Fig. 4e, results of age-
(PD-1+TIM-3+) and tendencies for fewer IFNγ+ CD8 T cells at the first and DFSO-adjusted analysis in Extended Data Table 6). In particular,
sample collection, compared with their counterpart male who did not CXCL10, M-CSF and IL-15 were positively correlated with IFNγ+CD8
progressed to worse disease (Fig. 4d). However, in female patients, T cells in female patients (Fig. 4d).
the deteriorated group had similar levels of these types of CD8 T cells By contrast, in male patients, progressive disease was associated
compared with the stabilized group (Fig. 4d). with higher age, higher BMI, and poor CD8 T cell activation (Fig. 4e).
Although our study provides a strong basis for further investigation Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
into how COVID-19 disease dynamics may differ between male and published maps and institutional affiliations.
female patients, it is important to note that there are some limitations
© The Author(s), under exclusive licence to Springer Nature Limited 2020
to the analyses presented in this Article. First, we acknowledge that
the healthy HCWs used as the control population were not matched to
patients on the basis of age, BMI or underlying risk factors. To account Yale IMPACT Research Team*
for this, we performed adjusted analyses for the baseline and longi-
tudinal comparisons between patients (cohort A and the full patient Kelly Anastasio14, Michael H. Askenase15, Maria Batsu16, Hannah Beatty16, Santos Bermejo16,
population, cohort B) and HCWs, controlling for age and BMI. However, Sean Bickerton17, Kristina Brower2, Molly L. Bucklin1, Staci Cahill14, Melissa Campbell3,
Yiyun Cao1, Edward Courchaine17, Rupak Datta3, Giuseppe DeIuliis8, Bertie Geng16,
we cannot rule out residual confounding due to underlying risk factors Laura Glick16, Ryan Handoko16, Chaney Kalinich2, William Khoury-Hanold1, Daniel Kim1,
not available for the HCW controls. Lynda Knaggs16, Maxine Kuang14, Eriko Kudo1, Joseph Lim18, Melissa Linehan1, Alice
Collectively, these data suggest that vaccines and therapies to Lu-Culligan1, Amyn A. Malik11, Anjelica Martin1, Irene Matos16, David McDonald16, Maksym
Minasyan16, Subhasis Mohanty3, M. Catherine Muenker2, Nida Naushad16, Allison Nelson16,
increase T cell immune responses to SARS-CoV-2 might be warranted Jessica Nouws16, Marcella Nunez-Smith19, Abeer Obaid16, Isabel Ott2, Hong-Jai Park16,
for male patients, whereas female patients might benefit from thera- Xiaohua Peng16, Mary Petrone2, Sarah Prophet20, Harold Rahming16, Tyler Rice1, Kadi-Ann
pies that dampen innate immune activation early during disease. The Rose16, Lorenzo Sewanan16, Lokesh Sharma8, Denise Shepard16, Erin Silva16, Michael
Simonov16, Mikhail Smolgovsky16, Eric Song1, Nicole Sonnert1, Yvette Strong16, Codruta
immune landscape in patients with COVID-19 is considerably different Todeasa16, Jordan Valdez16, Sofia Velazquez15, Pavithra Vijayakumar16, Haowei Wang3,
between the sexes, and these differences may underlie heightened Annie Watkins2, Elizabeth B. White2 & Yexin Yang1
disease vulnerability in men.
14
Yale Center for Clinical Investigation, Yale University School of Medicine, New Haven,
CT, USA. 15Department of Neurology, Yale University School of Medicine, New Haven, CT,
USA. 16Yale School of Medicine, New Haven, CT, USA. 17Department of Biochemistry and
Online content of Molecular Biology, Yale University School of Medicine, New Haven, CT, USA. 18Yale
Any methods, additional references, Nature Research reporting summa- Viral Hepatitis Program, Yale University School of Medicine, New Haven, CT, USA. 19Equity
Research and Innovation Center, Yale University, New Haven, CT, USA. 20Department of
ries, source data, extended data, supplementary information, acknowl- Molecular, Cellular and Developmental Biology, Yale University School of Medicine,
edgements, peer review information; details of author contributions New Haven, CT, USA.
Extended Data Fig. 1 | Comparison of basic clinical parameters of cohort A chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and F_Pt,
patient samples and plasma levels of 71 cytokines and chemokines at the respectively. Data are mean ± s.e.m. P values were determined by unpaired
first sampling of cohort A. a, Comparisons of age, BMI and DFSO at the first two-tailed t-test (a) or one-way ANOVA with Bonferroni multiple comparison
sampling between male and female patients in cohort A. n = 17 and 22 for M_Pt test (b). All P values < 0.10 are shown.
and F_Pt, respectively. b, Comparison of the plasma levels of 71 cytokines and
Extended Data Fig. 2 | Heat maps of cytokines and chemokines, PBMC c, A heat map for the T cell subsets (percentage in CD3+ cells). n = 6, 45, 16
composition, T cell subsets, and T cell cytokine expression at the first and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. d, A heat map for the
sampling of cohort A patients. a, A heat map of the plasma levels (pg ml−1) of intracellular cytokine staining of T cells (percentage in CD3+ cells). n = 6, 43, 16
71 cytokines and chemokines. n = 15, 28, 16 and 19 for M_HCW, F_HCW, M_Pt and and 22 for M_HCW, F_HCW, M_Pt and F_Pt, respectively. In all of these heat
F_Pt, respectively. b, A heat map for the composition of PBMCs (percentage in maps, log-transformed values were used for heat map generation.
live PBMCs). n = 6, 42, 16 and 21 for M_HCW, F_HCW, M_Pt and F_Pt, respectively.
Article
Extended Data Fig. 3 | Flow cytometry gating strategy. a–c, Gating strategy used for monocytes (a), CD38+HLA-DR+ and PD-1+TIM-3+ CD4 or CD8 T cells (b), and
T cell intracellular staining for IFNγ+ CD8 T cells (c).
Extended Data Table 1 | Demographic and clinical characteristics of cohort A, cohort B and HCW comparison groups
*Ethnicity: (1) American Indian/Alaskan native; (2) Asian; (3) Black/African American; (4) Native Hawaiian/Pacific Islander; (5) White; (6) Hispanic; (9) Multiple; (98) Unknown/unavailable.
†COVID-related risk factors: (0) No; (1) cancer treatment within 1 year; (2) chronic heart disease; (3) hypertension; (4) chronic lung disease (asthma, chronic obstructive pulmonary disease
(COPD) and interstitial lung disease (ILD)); (5) immunosuppression.
‡C1, clinical score at the first sample collection date.
§Days from symptom onset at the first sample collection.
||Cmax, maximum clinical score during the admission after the first time point sample collection.
¶Days from symptom onset at the first day Cmax was recorded in deteriorated patients.
#Collected sample or data types at the first sample collection date.
E, plasma cytokine/chemokine ELISA; F1, flow cytometry PBMC cell composition staining; F2, flow cytometry T cell surface staining; F3, flow cytometry T cell intracellular staining; G, plasma
anti-S1-IgG; M, plasma anti-S1-IgM; N, nasopharyngeal viral load; S, saliva viral load.
Extended Data Table 3 | Adjusted least square means difference in immune response at baseline between male and female
patients with COVID-19 in cohort A and male and female HCW controls
*Adjusted for age, BMI, days from symptom onset, tocilizumab treatment, corticosteroid treatment and ICU status.
†log10(SARS-CoV-2 copies per ml); nasopharyngeal nPT_F = 33, nPt_M = 30; saliva nPT_F = 20, nPt_M = 18.
‡
OD450; nPT_F = 44, nPt_M = 39. §log10(pg ml−1); nPT_F = 48, nPt_M = 43.
||As a percentage of live cells, unless indicated otherwise; nPT_F = 46, nPt_M = 42.
¶
nPT_F = 33, nPt_M = 40.
#As a percentage of CD3-positive cells; nPT_F = 49, nPt_M = 42.
P values were determined using two-sided t-test and Morel–Bokossa–Neerchal correction.
Extended Data Table 5 | Adjusted least square means difference over time in immune response between male and female
patients with COVID-19 in cohort B and male and female healthy HCW controls
,/"$#,"$0%1%!- # .." +
$$.2$%#0*3$+
4$%15$%"$1
1#*3%67%4.2$#0$#40$0+"$""0+
$!684#$%4."$
/"
$#,"0%309#:$"3;30"$%:$"3;30+%053$6
7425
-$"$$0
8"33$"$$0"3""3+904.$%"$$%4331!$."$ $%4!#3!9$"*33!9."$<$9=$%0$6
>" 4.
n 7%<"0$".3 ?&@'4"0%<.$"3!#>0$9!2" ""
0$#.*"#$4."#.$
n A$"$.$1% $%."#.$1$"54.$0$".3 1%$%$%".".31"."#"$3+
n7 %$"$$0"3$$&'#A/B1%$%$%+"C$1C
D@EFGHIJJI@GKLMKMGMNIOEPGQLGPLMHRSQLPGMIELEFGQFG@TJLUGPLMHRSQLGJIRLGHIJVELWGKLHN@SXOLMGS@GKNLGYLKNIPMGMLHKSI@Z
n A0$ 4"3302""$$$
n A0$ 4"+"#.$ 00$9#0%" "$
$ 4."3$+""[#$.$4.#3$30."
n A4 #330$4$%$"$$0"3"".$03#!0$"3$0+&6!6."'$%*"0$."$&6!6!0440$'
A/B2""$&6!6$""2"$'"0"$$."$ 4#0$"$+&6!6040$2"3'
n 8#3 3%+$%$$!9$%$$$"$$0&6!6\9K9R'1$%040$2"39440$?9!
^S_LG]G_TEOLMGTMGLWTHKG_TEOLMG̀NL@L_LRGMOSKTQELZ
44."]2"3#$
n 8a"+"""3+94."$ $
%0%0 4"="520%"=$"3$$!
n 8%"0%0"3"0.3<!9$40"$ 4$%""$3234$$"4#33$!
4#$0.
n :$."$4440$?&6!6%bP9;"bR'90"$!%1$%+10"30#3"$
DORG̀LQGHIEELHKSI@GI@GMKTKSMKSHMGcIRGQSIEIdSMKMGHI@KTS@MGTRKSHELMGI@GJT@FGIcGKNLGVIS@KMGTQI_LZ
-4$1""0
;30+4."$"*#$"2"3"*3$+
40.#$0
B"$"0330$ :;j:g,4$1"&$0$2:=,21"030"3"$""!!!"$'",:B"o6p6q&030"3"$""!!!"$'6
B"$"""3+ ,&2p6q6p'9!!3$l&2p6p6m'9f"%";.&2r'9-A-&2o6s'9A$$#/<7&2p6k6l'9831t&2km6q97
-$"'6
8."#0$#$3?!0#$."3!$%.4$1"$%"$"0$"3$$
%"0%*#$$+$0*#*3%3$"$#94$1".#$*."
"2"3"*3$$
$"
216e
e$!3+0#"!0$ "0..#$+$+&6!6f$g#*'6-$%/"$#,"0%!#34#*.$$!0h4$1"44#$%4."$6
B"$"
;30+4."$"*#$"2"3"*3$+
4"$"
A33."#0$.#$03#""$""2"3"*3$+$"$.$7%$"$.$%#32$%4331!4."$91%"30"*3(
CA0009#i#$491*354#*303+"2"3"*3"$"$
CA3$44!#$%"$%"2"0"$"1"$"
CA0$4"+$0$
"$""2"3"*3$+
A33$%4310+$.$+"$"133*"
2"3"*32"j..;$1*$&$#+jB(-Bukqsr'6A33$%"$"&2"33"9"$*+$$9:)j-A"$"'#$$!
"$4!#"
$"*3$%$#+"03# $%-#3.$"+j4."$7"*3kk6
0
8;3"33C0$$%0*431$0 $ !
12345656762589
5653
17425
%"$$%*$4$4+#"0%6j4+#"$#9"$%""$0$*4."5!+#30$6
n )400 a%"2#"3h0"300 :03!0"3923#$"+h2.$"300
8"400+4$%0#.$1$%"330$9"$#60.>0#.$>C$!C#.."+C43"$64
)A334$ 0 0 $# + !
#.#$03$%$21%$%03#!"$26
-".3? /$"$$0"3.$%1#$0"30#3"$$%".3?6-".3?1"$.*"$%#.*4"$$".$$$u"3C
/1g"2g$"3&u/gg'*$1="0%kr$%"="+o$%$%"$133"0$1$%$%0#$$#+#j,a"gj
"2$03wlmmmmlxqom6;"$$1$4$%#!%0!4:=,04$$"333.$6j2#"31$%"0$2
0%.$%"+"!"$0"09!"$"$$9"$$1$%*"05!#%."$3!0"3"*."3$9"$$1$%"#$..#"
&6!6%#."$"$%$'9""$$1$%"%$+4!"$"3"$"$"..##2"!$1$03#9"
$$"3or"$$103#$%$#+6j4.0$1"*$"*+$"$"44"".30330$0..0.."$3+
#$#+33.$630"30.10330$"<."$3+2+s"+1%"2#"3y030"3$"$#.$$9"1"
0$##$3"$$0%"!<"$6
B"$"<03# 71<$.#$3".340+$5:)j-A$%$$#+6="#.$4.$%2#"31#$3&*+k6z<$%
$i#"$3"!'.$%"%"344$%0+$5."#67%$!3+#!!$$%"$"$0%0"300##!$%$1
<.$".3i#"3$+6)51944310+$.$+9"$".+3"3"7033#4"0$"!"34$12#"3
1#$3.$%"%"344$%"".$."#91%0%#!!$$%$"!i#"3$+#!$%<.$67%#9"$"4
$%".31<03#4.$%""3+6
,30"$ 7%."#.$1$30"$C3!$#"3""3+4".34.%#."2#"36
,".?"$ 7%$32"$"$%"*2"$"3$#+6
a3! A$$%$.4".3"0i#$"0!90$$10.3$3+#"1"4$%"$$y0$6a3"0i#$
4."0*+"""$$".6j4."$4"$$y0$"$"2"3"*3#$3"4$0!"""3+!"1
"$"*+4310+$.$+":)j-A6A030"3$".9""$4.$%<.$"3$".94.0%"$21$$."$$y
32"$$"$$06+$5"4"0""3+1*36;"$$030"34."$"030"300!13+2"3"4$
"$"0330$6
,ei#4$."$!4 040."$"39+$.".$%
4."#$%"*#$.$+4."$"39<.$"3+$.".$%#."+$#6g90"$1%$%"0%."$"39
+$..$%3$32"$$+#$#+6j4+#"$#4"3$$."3$+#"0%9"$%""$0$*430$!"6
="$"3h<.$"3+$. =$%
>" j232$%$#+ >" j232$%$#+
n A$* n %j;Ci
n :#5"+$00333 n 8310+$.$+
n ;"3"$3!+""0%"3!+ n =,jC*"#."!!
n A."3"$%!".
n g#.""0%"$0"$
n 30"3"$"
n B#"3#"0%400
A$*
A$*# A33"$*#$%$#+""!"$%#."$6aazkz"$C%g)ACB,&fsqCq'&k(smm'&aBa00'9a{xrz"$%Bkq
&pfr'&k(kmm'&a)!'9;:C+x"$C%Bks&gBks'&k(pmm'&a)!'9a{qmz"$C%Bp&|g7k'&k(pmm'&a)!'9
a{xkk"$C%Bko&-tlzk'&k(pmm'&aBa00'9A3<"83#qsx"$C%Bk0&)kqk'&k(kzm'&a)!'9a$"$C%Bksk&=rm'
&k(kzm'&a)!'9;:CB"??3zos"$C%Bzq&gBzq'&k(pmm'&a)!'9;:"$C%Bpms&kll'&k(pmm'&a)!'9A;8xzm
"$C%Bkk*&j,8ss'&k(kmm'&a)!'9;;>+z6z"$C%Bqq*&fkm8z'&k(lmm'&aBa00'9a{xrz"$C%Bs&-}p'&k(lmm'
&a)!'9A;8xzm;:C+xa{xkk"$C%Br&-}k'&k(lmm'&a)!'9a{slk"$C%,x&fmspgx'&k(zm'&a)!'9
A3<"83#xmm"$C%Bsz,A&gjkmm'&k(lmm'&aBa00'9;:"$C%;Bk&:gkl6lgx'&k(lmm'&a)!'9A;"$C%7j=p
&8prCl:l'&k(zm'&a)!'9a{xkk"$C%Bpr&gj7l'&k(lmm'&a)!'9aaxmm"$C%~,z&,8ral'&k(zm'&aBa00'9
;:+x"$C%Bklx&gj)Cx,C=lk'&k(zm'&a)!'9;:C8zos"$C%Blz&aoq'&k(lmm'&aBa00'9a{xkk"$C%Bklx
v
&gj)Cx,C=lk'&k(zm'&aBa00'9a{slk"$C%j)kx"&/soCqzp'&k(kmm'&aBa00'9A3<"83#xmm"$C%7/8"&=A*kk'
&k(kmm'&a)!'9;:A;>8xzm"$C%j8/+&s-6ap'&k(qm'&a)!'98j7"$C%f"?+.a&fakk'&k(lmm'&a)!'9
A3<"83#qsx"$C%j)Cs&rBsCr'&k(kmm'&a)!'9aaxmm"$C%Bkrp>~,p&kq>~,p'&k(kmm'&aBa00'9;:C+x
12345656762589
5653
17425
"$%j)Cq&=lCkpAz'&k(zm'&a)!'9;:"$C%j)Cl&zpss6kkk'&k(zm'&aBa00'9a{xrz"$C%Bko&-tlzk'&k(pmm'
&a)!'9a{slk"$C%Bkpr&=jkz'&k(pmm'&a)!'9A3<"83#xmm"$C%Blm&lgx'&k(lmm'&a)!'9A3<"83#qsx
"$C%Blx&=C7lxk'&k(pzm'&a)!'9;:>B"??3zos"$C%j!B&jAqCl'&k(smm'&a)!'9;:C+x"$C%Brq&j7l6l'&k(kmm'
&a)!'9A;>8xzm"$C%j!=&=g=Crr'&k(lzm'&a)!'9a{qmz"$C%Bls&=)z'&k(lmm'&a)!'9a{slk"$C%Bkm
&gjkm"'&k(lmm'&a)!'9a{slk"$CB%kz&--:ACk'&k(lmm'&a)!'9A3<"83#xmm-$$"2&k(pmm'&7%.8%'9
a{qmz-$$"2&k(pmm'&a)!'6
{"3"$ A33"$*#$%$#+"0..0"33+"2"3"*39""33%"2*2"3"$*+$%."#4"0$#"#*+$%
#*30"$6)5191$$"$$%"$*"00!$#1#$"!0$67%4331!12"3"$$%
4331!0(aazkz"$C%g)ACB,&fsqCq'&aBa00'&g#."9,%#9+.3!#9a"*'9a{xrz"$C%Bkq&pfr'
&a)!'&g#."9A40"f9a"*9"#0%=5+9%."?9+.3!#9=".$9;!$"3="0"i#9,%#9
-$+="!"*+9-i#3=5+'9;:C+x"$C%Bks&gBks'&a)!'&g#."'9a{qmz"$C%Bp&|g7k'&a)!'
&g#."9%."?'9a{xkk"$C%Bko&-tlzk'&aBa00'&g#."'9A3<"83#qsx"$C%Bk0&)kqk'&a)!'&g#."9
A40"f9a"*9+.3!#9,%#'9a$"$C%Bksk&=rm'&a)!'&g#."9A40"f9a"*'9;:CB"??3zos
"$C%Bzq&gBzq'&a)!'&g#."9A40"f9a"*9+.3!#9,%#'9;:"$C%Bpms&kll'&a)!'
&g#."'9A;8xzm"$C%Bkk*&j,8ss'&a)!'&g#."9A40"f9a"*9%."?9..=".$9
+.3!#9,%#9-1'9;;>+z6z"$C%Bqq*&fkm8z'&aBa00'&g#."'9a{xrz"$C%Bs&-}p'&a)!'
&g#."'9A;8xzm;:C+xa{xkk"$C%Br&-}k'&a)!'&g#."9C,"0$2$+(A40"f9%."?9
+.3!#9;!$"3="0"i#9,%#9-$+="!"*+'9a{slk"$C%,x&fmspgx'&a)!'&g#."9A40"f9
a"*9+.3!#9,%#'9A3<"83#xmm"$C%Bsz,A&gjkmm'&aBa00'&g#."'9;:"$C%;Bk&:gkl6lgx'
&a)!'&g#."9A40"f9a"*9%."?9..=".$9+.3!#9,%#9-i#3=5+'9A;
"$%7j=p&8prCl:l'&a)!'&g#."'9a{xkk"$C%Bpr&gj7l'&a)!'&g#."9%."?9g'9aaxmm"$C%~,z
&,8ral'&aBa00'&g#."'9;:C+x"$C%Bklx&gj)Cx,C=lk'&a)!'&g#."'9;:C8zos"$C%Blz&aoq'&aB
a00'&g#."9,%#9+.3!#9a"*'9a{xkk"$C%Bklx&gj)Cx,C=lk'&aBa00'&g#."'9a{slk"$C%j)Ckx"
&/soCqzp'&aBa00'&g#."'9A3<"83#xmm"$C%7/8"&=A*kk'&a)!'&g#."9"$9C,"0$2$+(%."?9
a"*9+.3!#9,%#9;!$"3="0"i#9-$+="!"*+9-1'9;:A;>8xzm"$C%j8/+&s-6ap'&a)!'
&g#."9C,"0$2$+(%."?9a"*9+.3!#9,%#'98j7"$C%f"?+.a&fakk'&a)!'&g#."9=#9
C,"0$2$+(,"$'9A3<"83#qsx"$C%j)Cs&rBsCr'&a)!'&g#."9C,"0$2$+(%."?9a"*9+.3!#9
,%#'9aaxmm"$C%Bkrp>~,p&kq>~,p'&aBa00'&g#."9,%#9+.3!#9a"*'9;:C+x"$Cj)Cq
&=lCkpAz'&a)!'&g#."'9;:"$C%j)Cl&zpss6kkk'&aBa00'&g#."'9a{xrz"$C%Bko&-tlzk'&a)!'
&g#."'9a{slk"$C%Bkpr&=jkz'&a)!'&g#."'9A3<"83#xmm"$C%Blm&lgx'&a)!'&g#."9a"*9"#0%
=5+9%."?9+.3!#9;!$"3="0"i#9,%#9-i#3=5+'9A3<"83#qsx"$C%Blx&=C7lxk'&a)!'
&g#."9C,"0$2$+(a"*9+.3!#9,%#'9;:>B"??3zos"$C%j!B&jAqCl'&a)!'&g#."'9;:C+x"$C%Brq
&j7l6l'&a)!'&g#."9A40"f9a"*9"#0%=5+9..=".$9$$C$7"."9%."?9
+.3!#9,%#'9A;>8xzm"$C%j!=&=g=Crr'&a)!'&g#."9A40"f9a"*9+.3!#9,%#'9a{qmz
"$C%Bls&=)z'&a)!'&g#."9C,"0$2$+(%."?'9a{slk"$C%Bkm&gjkm"'&a)!'&g#."9A40"
f9a"*9"#0%.5+9%."?9+.3!#9,%#'9a{slk"$C%Bkz&--:ACk'&a)!'&g#."'9A3<"83#
xmm-$$"2&k(pmm'&7%.8%'9a{qmz-$$"2&k(pmm'&a)!'6
g#.""0%"$0"$
;30+4."$"*#$$#232!%#.""0%"$0"$
;#3"$0%""0$$0 84$+C4."3&"!qs6mkq6o'"sx."3&"!qk6okq6x'"$$103#67%$"3.!"%04."$
0"*4#:<$B"$"7"*3k6
,0#$.$ ;"$$".$$$$%u"3/1g"2g$"3&u/gg'*$1$%kr$%4="0%$%#!%$%o$%4="+lmlm91
0#$$$%u"3j=;A7$#+&j.3.$!=0"3";#*30g"3$%A0$A!"$"2#7'"4$$$!
$24-A,-C{l*+i,7C;,6&3!+1"4#$%04.4"33"$$33'6;"$$1$4
$%#!%0!4:=,04$$"333.$1$%3430$6j4.0$1"*$"*+$"
$"44"".30330$0..0.."$3+#$#+33.$630"30.10330$
"<."$3+2+s"+1%"2#"3y030"3$"$#.$$9"1"0$##$3"$$0%"!
<"$6
:$%02!%$ u"3g#.","0%;$0$;!".j$$#$"3,21a"6j4.0$1*$"4."3333
"$$"%"3$%0"156#"0%$031"21""2*+$%u"3-0%34=0j,a"
gj&wlmmmmlxqom'6j4.0$1"*$"*+$"$"44"0."$"#"0%"$"*"4$%
#"$4#$#+67%1.03#$%$#+6
/$$%"$4#334."$$%"2"34$%$#+$03.#$"3*2$%."#0$6
831+$.$+
12345656762589
55653
17
;3$
4.$%"$(
n 7%"<3 "*3$"$$%."5"43#0%.#&6!6BsC8j7'6
n 7%"<0"3 "03"3+2*36j03##.*"3!"<3+4*$$.34$3$4!#&
&"b
"!#b
"
""
"3+
4$0"3."5'6
n A333$"0$#3$1$%#$3 #033$6
n A#.0"32"3 #4#.*4033 0$"!&1$%$"$$0'
26
7425
=$%3!+
-".3""$ 8%3+3"$;a=1$"432""."59*3051$%g#."7#-$"80~9$"4#4"0
."5"$%4<1$%;8As
68$"033#3"0+$5$"!4331!$.#3"$90331#4"0$"9
1"%"4<ss
;
8A6A4$."*3?"$1$%kk~;~ ."*3?"$a#440331$"4$"033#3"
0+$5""3+6
j$#.$ 331"0i#" "A
$$#/~7&7%.8%'6
-4$1" B"$"1""3+#!831t4$1"2km6q4$1"&7-$"'6
33#3"$"*#"0 33#3"$"*#"0(33#3"$1$2"#4."$03#!""" #.*00$"$ 4$%
"$$y*3".3&<kmq033>.)'9""
$4329!3;a=&&
4)2'9""
$4""$!"$& &
4Bs70339
4=0+$9$06'67%4#33!"$!"$%403"40"$03#$%<$4!#6
f"$!$"$!+ --CA"8-CA"".$1#$$ 30$3#50+$4. 3"$;a=6)2""03314*"
"i#"$"!6-!3$1""$*"- ->8-"".$6)#50+$1!"$*"$
$$4+
3+.%0+$&Bp>Bs>Br>Bko>Bzq."5'9!"#30+$&Bkq9Bks9g)ACB,."5'"B9"0B&Bpms9
Bk09Bksk'67,C"0$2"$703397."33+C44$"$70339""$"3#*$614#!g)ACB,9
Bpr9,x9Bklx9;Bk97j=Cp9~,z9Bsz,A9Blz6j$"033#3"7033!"$!$"$!+$$$4+Bs">Br7033
0$!7/8"9j8/C+9j)Cq9j)Cl9f"?+.a9j)Cs9">j)Ckx14#!$%04."5(Bp9Bs9Br97/89
j8/9j)Cq9j)Cl9j)Cs9j)Ckx"!"?+.aa6
n 705$%*<$
$0
4.$%"$"4!#<.34+!$%!"$!$"$!+
2
$%-#3.$"+j4."$6
Article
Our intestines are constantly exposed to large amounts of antigens maturation occur in the midst of chronic antigenic stimulation, and
derived from diet and commensal microbes. The interaction of these to define the impact of the microbiota on these processes.
antigens with the immune system takes place primarily in gut-associated To estimate the rate of B cell selection in steady-state gaGCs, we first
secondary lymphoid structures, including gut-draining mesenteric used in situ photoactivation of mice engineered to express photoac-
lymph nodes (mLNs) and Peyer’s patches, where gut-associated ger- tivatable green fluorescent protein (PA-GFP)11,12 (Fig. 1a and Extended
minal centres (gaGCs) provide a site for the hypermutation of immu- Data Fig. 1a) to sequence B cell immunoglobulin heavy chain genes
noglobulin genes even under steady state5,9. B cell antigen receptor (Igh) from 20 individual gaGCs from various mLNs of 5 mice housed
(BCR)-driven selection and affinity maturation of antibodies occur under SPF conditions. Clonal diversity in SPF gaGCs spanned a wide
efficiently in gaGCs upon oral immunization6,10. However, given previ- range, with a median of 33 clones per germinal centre (using the Chao1
ous reports that steady-state gaGCs can form in a BCR-independent estimator function), a D50 value (the fraction of clones accounting for
fashion2,6, show little evidence of BCR-driven selection of specific 50% of sequenced cells) of 0.20, and 30% of B cells belonging to the
antibodies at the sequence level3, and are associated instead with the largest clone in the germinal centre (Fig. 1b, c). One of the 20 samples
selection of polyreactive immunoglobulins4, it has been postulated that sequenced contained a highly dominant clone that accounted for 64%
gaGCs may act predominantly as diversifiers of the immunoglobulin of cells in that germinal centre (Fig. 1b, c). Analysis of somatic muta-
repertoire, rather than fostering affinity maturation towards com- tions within this clone (Fig. 1d) showed the nested expansion of nodes
mensal microbes (reviewed in refs. 5,8). We thus sought to determine with increasing numbers of mutations (indicated by arrows) typical of
the extent to which germinal-centre selection and antibody affinity sequential positive selection.
Laboratory of Lymphocyte Dynamics, The Rockefeller University, New York, NY, USA. 2Laboratory of Mucosal Immunology, The Rockefeller University, New York, NY, USA. 3Mucosal Immunology
1
Group, Department of Pediatrics, University Medical Center Rostock, Rostock, Germany. 4These authors contributed equally: Carla R. Nowosad, Luka Mesin. ✉e-mail: mucida@rockefeller.edu;
victora@rockefeller.edu
Cells in the
Clones per
102
70 60
D50
0.2
60 40
101
50 0.1
FDCs (anti-CD35-Cy3) 20
40
Sort PA+ cells 30 100 0 0
Igh sequencing
20
d UA
10
Inferred precursor 1 mutation
0 n n cells with same sequence
Dominant clone Expanded clones Singletons
e Steady-state gaGCs 2
3
Δt 2 2
AicdaCreERT2/+.Rosa26Confetti/Confetti 3
Tamoxifen (2×)
f Caecal–colonic mLN, day 7 post-tamoxifen Caecal–colonic mLN, day 15 post-tamoxifen Caecal–colonic mLN, day 21 post-tamoxifen
L L L
L
L LL
L
LL
g Dominance × density
Fate-mapped
Fluorescent
B cells (%)
0.6 60 0.6 60
NDS
0.4 40 0.4 40
0.2 20 0.2 20
0.0 0 0.0 0
0 5 10 15 20
14 7
21 15
3
35
42
49
56
14 7
21 15
3
35
42
49
56
14 7
21 5
3
35
42
49
56
–2
–2
–1
–2
–
Weeks post-tamoxifen
Days post-tamoxifen Days post-tamoxifen Days post-tamoxifen
Fig. 1 | Kinetics of clonal selection in steady-state gaGCs. a, Experimental normalization for coloured cell density). Scale bars represent 200 μm in main
setup for in situ photoactivation of gaGCs. mLNs from photoactivatable (PA), images and 50 μm in close-ups. g, Quantification of multiple images as in f. Left,
GFP-transgenic, SPF mice are photoactivated and dissected; photoactivated density of coloured cells (that is, fluorescent cells in the germinal-centre dark
(PA+) cells are sorted and Igh genes sequenced. FDC, follicular dendritic cell, zone). Centre, colour dominance (that is, frequency of the most-common
visualized with an antibody against Cy3-labelled CD35. b, Clonal composition colour). Right, NDS (excludes germinal centres with density < 0.4). Each symbol
of individual germinal centres from five mice (SPF1–5), obtained as in a. represents one germinal centre. Only germinal centres with a density of more
J, jejunal mLN; I, ileal mLN; C, caecal-colonic mLN. c, Quantification of data in than 0.4 fluorescent cells per 100 μm2 are included in the NDS calculation.
b. Each symbol represents one germinal centre; medians indicated by centre Filled symbols indicate data from two mice in which tamoxifen was
lines. d, Relationships between Igh sequences of B cells derived from the administered intraperitoneally. Medians are indicated. Data are from 3–5 mice
largest B cell clone in b. Arrows indicate putative positive selection events. per time point at days 14–35, and 1 to 2 mice for day 7 and later time points.
UA, unmutated ancestor. e, Experimental setup for ‘Brainbow’ fate-mapping. h, SPF S1pr2CreERT2 × Rosa26Stop-tdTomato mice were given two doses of tamoxifen,
Steady-state gaGCs from AicdaCreERT2/+.Rosa26Confetti/Confetti mice are recombined two days apart, to label germinal-centre B cells. The graph shows the proportion
to produce different colours by treatment with tamoxifen and followed over of tdTomato+ mLN B cells assayed by flow cytometry at the time points
time (Δt). f, Representative multiphoton images of mLNs from SPF mice at indicated, counting from the first dose of tamoxifen. Each symbol represents
different times after tamoxifen treatment. Blue is collagen (second harmonics); one mouse. Half-lives were quantified using a one-phase exponential decay
white is autofluorescence; other colours are from the Confetti allele. Numbers function (black line). CI, confidence interval. Gating strategies are detailed in
in parentheses represent normalized dominance scores (NDS; that is, the Extended Data Fig. 1a–c.
frequency of B cells in a germinal centre that carry the dominant colour after
This was confirmed in a much larger number of germinal centres progressively after labelling, probably as a result of germinal-centre B
using Brainbow13 multicolour fate-mapping11 (see Supplementary Infor- cells being replaced by incoming unlabelled clones as the response
mation). We fate-mapped steady-state gaGC B cells in AicdaCreERT2/+. evolved (Fig. 1f, g). Because density estimations using Brainbow are sen-
Rosa26Confetti/Confetti (AID-Confetti) mice held under SPF conditions by admin- sitive to dropout of low-fluorescence germinal centres, we measured
istering two doses of tamoxifen, two days apart (Fig. 1e). This labelled germinal-centre turnover by flow cytometry using the S1pr2CreERT2 BAC
roughly 50% of B cells in both mLNs and Peyer’s patches, as estimated by transgene14 crossed to the Rosa26Stop-tdTomato reporter. Pulse-labelling using
the density of coloured cells in the germinal centres and by flow-cytometry this model allowed us to place the half-life of gaGC B cells at roughly two
experiments (Fig. 1f, g and Extended Data Fig. 1b). This fraction decreased weeks under SPF conditions (Fig. 1h and Extended Data Fig. 1c).
(0.83) 7 S078 11
2
mLN 2 3
32/44 YFP n n cells with
RFP same sequence
CFP + RFP Inferred precursor
1 mutation
b Secondary only MG053 (negative control) S078 (mutated) S078.U (reverted) S078 (P = 0.016)
Monoclonal
104 104 104 104
0.10
103 103 103 103
0.05
102 102 102 102
0 0 0 0
0.00
0102 103 104 105 0102 103 104 105 0102 103 104 105 0102 103 104 105 M U
Secondary only G200 (negative control) S120 (mutated) S120.U (reverted) S120 (P = 0.016)
8
Monoclonal
104 104 104 104
A450
A450
2
S078 0.4
S212 0.5
1 0.2
S120
] S116/S118/S080/
0 0.0 0.0
100 101 102 10 3 MG053 (negative control) 100 101 102 103
100 101 102 103
Monoclonal antibody (nM) Monoclonal antibody (nM) Monoclonal antibody (nM)
Fig. 2 | Selection of commensal-binding clones in steady-state gaGCs. Scale bars, 50 μm. Additional trees are in Extended Data Fig. 2b, c. b, Binding of
a, Relationships among Igh sequences from B cells of high-NDS germinal monoclonal antibodies to faecal bacteria from specific-pathogen-free (SPF)
centres, sorted as in Extended Data Fig. 2a. Left, images and pie charts show mice. Gated on SYTO BC+, DAPI– live bacteria (Extended Data Fig. 2d;
clones (inner ring, with grey representing the major clone) and Brainbow see Methods). Clones MG053 and G200 (see below) are non-bacteria-reactive
colour distributions (outer ring) for each germinal centre. Numbers within negative controls. The graphs at the right summarize seven independent
images are NDS values; numbers in pie charts are numbers of cells in the major experiments (M, mutated; U, unmutated); background (percentage positive in
clone by the total number of cells sequenced. Right, phylogenies for major secondary only) is subtracted from all data points. P values obtained from
clones (grey in the pie charts). Arrows indicate ‘clonal burst’ points; names are two-tailed Wilcoxon paired samples test. c, Binding of monoclonal antibodies
indicated whenever a recombinant monoclonal antibody was generated from a to faecal bacteria, assessed by ELISA. Lines show means of three assays. A450,
sequence (see Supplementary Table 1). CFP, cyan fluorescent protein; RFP, red absorbance at 450 nm. d, As in c, but showing monoclonal antibodies S078 and
fluorescent protein; YFP, yellow fluorescent protein; UA, unmutated ancestor. S120 as well as their unmutated ancestors.
Despite this rapid turnover, the normalized dominance score (NDS, centres with NDS values of more than 0.5 could occasionally be detected
an estimate of the frequency of B cells in a germinal centre that carry as early as day 14 after tamoxifen, peaking at day 23 after tamoxifen,
the dominant colour11,15; see Supplementary Information) in gaGCs when 15% of Peyer’s patches (3 of 20) had reached NDS values of more
increased progressively to day 23 after tamoxifen, when 11% of germinal than 50% (Extended Data Fig. 1e). We conclude that clonal selection is
centres (6 of 57) scored 0.75 or higher (Fig. 1f, g). The strongest clonal detectable in gaGCs, despite chronic exposure to a high burden and
expansions that occur in mLN germinal centres are therefore large and diversity of foreign antigens and the rapid turnover of B cell clones.
rapid enough to generate dominant lineages, despite the replacement To understand the relationship between clonal selection and affinity
of labelled clones with incoming unlabelled B cells. In germinal centres maturation in gaGCs, we used vibratome slicing of agarose-embedded
within Peyer’s patches, clonal selection progressed at a slower rate AID-Confetti lymph nodes11 (Extended Data Fig. 2a) to isolate gaGCs
(Extended Data Fig. 1d, e), as expected from their much larger size and containing ‘winner’ clones, where antigen-driven selection is most
similar rate of turnover (Extended Data Fig. 1f, g). However, germinal likely to have occurred11. Sequencing of Igh from B cells sorted from
mLN
50
0.4 40 0.4
14 7
8
14 7
8
14 7
8
3
NDS
i (0.91)
5–
–1
–2
5–
–1
–2
5–
–1
–2
20
20
20
P = 0.0084
1.0 100 1.0 100
i
0.8 80 0.8 75
3
–1
–2
–1
–2
–1
–2
14
20
14
20
14
20
Days 20–23
Days post-tamoxifen
d UA e UA
M218.U
(0.98)
G226 16 M218
5
(0.76) 4
6 6
PP mLN
78/81 RFP Inferred RFP Inferred
precursor 31/39 precursor
n n cells with 1 mutation n n cells with 1 mutation
identical identical
sequence sequence
f g h j
M218 M216 M220 M222 M224 5 2.0
Secondary B.o. C.i. 100 ED38 M218
80 M218 4 1.5 M218.U
MG053 (negative control) B.c. L.r.
M218.U 3 MG053
A450
E.f. 60 M220
A450
A.m. MG053 1.0
ED38 (positive control) M.i. 40 2 M218
F.p. 20 1 M216 0.5
M216 C.c. MG053
Ak.m. 0 0 0.0
0 103 104 105 100 101 102 103 100 101 102 103
M218 Bl.c.
Anti-hIgG1 AF647
M220 i M216
100 M216 100 M218 100 M220 1.2 3 M220
M222 M216.U
80 M216.U 80 M218.U 80 M220.U 1.0 M220.U
M224 MG053 MG053 0.8 MG053 2 MG053
60 60 60 MG053
A450
A450
0.6
M228 40 40 40 0.4 1
20 20 20 0.2
M232
0 0 0 0.0 0
0 103 104 105 0 103 104 105 0 103 104 105 0 103 104 105 100 101 102 103 100 101 102 103
Anti-human IgG1 AF647 Anti-human IgG1 AF647 Monoclonal antibody (nM)
Fig. 3 | Accelerated selection in gaGCs of germ-free and and Oligo-MM12 (e) germinal centres with high NDS values. Details as in Fig. 2a.
Oligo-MM12-colonized mice. a, Representative multiphoton images of Additional trees are in Extended Data Fig. 6a, b. Scale bars, 50 μm. f, Flow
germ-free mLNs and Peyer’s patches at different times after treatment with cytometry showing the binding of monoclonal antibodies to faecal bacteria
tamoxifen. Blue represents collagen (second harmonics); white shows from Oligo-MM12-colonized mice, detected using anti-human IgG1 Alexa Fluor
autofluorescence; other colours are from the Confetti allele. Scale bars 647. ED38, polyreactive positive control monoclonal antibody; MG053,
represent 200 μm (Peyer’s patch overview) and 50 μm in close-ups. Numbers in negative control. The dotted line placed at 102 is for reference purposes. g, Dot
yellow are NDS values. b, Quantification of multiple images as in a for mLNs and blot showing the binding of Oligo-MM12 monoclonal antibodies to a subset of
Peyer’s patches. Left, density of coloured cells (fluorescent cells in the cultured Oligo-MM12 strains (black font; see Supplementary Table 4).
germinal-centre dark zone). Centre, colour dominance (frequency of the Oligo-MM12 strains tested are Acutalibacter muris (A.m.), Clostriudium
most-common colour). Right, NDS (frequency of B cells in a germinal centre innocuum (C.i.), Enterococcus faecalis (E.f.), Muribaculum intestinale (M.i.),
that carry the dominant colour). Each symbol represents one germinal centre; Flavonifractor plautii (F.p.), Clostridium clostridioforme (C.c.), Akkermansia
medians indicated. Only germinal centres with a density of more than 0.4 muciniphila (Ak. M.) and Blautia coccoides (B.c.). Bacteroides ovatus (B.o.) and
fluorescent cells per 100 μm2 are included in the NDS calculations. c, Proportion Bacteroides caccae (B.c.) are negative controls (blue font). Arrows indicate E.f.,
of germinal centres with NDS values of more than 0.75 in mLNs (top) and more which is bound only by M218. h, i, Binding of monoclonal antibody M218 to
than 0.5 in Peyer’s patches (bottom) under SPF and germ-free conditions at cultured E. faecalis (h) and of three Oligo-MM12 antibodies to faecal bacteria
20–23 days after tamoxifen. For SPF and germ-free mLN gaGCs, n = 57 and 27, from Oligo-MM12-colonized mice (i), as measured by flow cytometry. Gated on
respectively. For Peyer’s patch gaGCs, n = 20 and 9, respectively. SPF data are SYTO BC+, DAPI– live bacteria (Extended Data Fig. 2d). Data in f–i are
from Fig. 1g and Extended Data Fig. 1e. P values are from two-tailed Fisher’s representative of experiments carried out on at least two separate occasions.
exact tests. Error bars represent exact binomial 95% confidence intervals. Data j, Binding of monoclonal antibodies to faecal bacteria, measured by ELISA.
for b, c are from 3–5 mice per time point, except for days 5–7 which were from 1 Lines show means of two assays.
mouse. d, e, Relationship among Igh sequences from B cells of germ-free (d)
such germinal centres showed evidence of ‘clonal bursts’—jackpot-type to bacterial flow cytometry, two of seven antibodies produced from
positive selection events in which multiple B cells descending from a burst-associated immunoglobulin sequences reproducibly bound
single somatic hypermutation (SHM) variant account for a large frac- faecal bacteria (Fig. 2b and Extended Data Fig. 2d, e). Binding followed
tion of cells in a germinal centre11 (Fig. 2a and Extended Data Fig. 2a–c). different patterns: whereas monoclonal antibody S078 bound strongly
Because clonal bursts are regularly associated with the acquisition of to a small population of bacteria, S120 bound with moderate intensity
affinity-enhancing mutations11, we produced recombinant monoclonal to a much larger cohort (Fig. 2b). These two antibodies—as well as two
antibodies16 using burst-point sequences to probe for binding to com- other clones (S210 and S212)—reacted with bacteria-rich centrifuga-
mensal bacteria (Supplementary Table 1). Despite the variation inherent tion fractions, as measured by enzyme-linked immunosorbent assay
Bits
2 2
1
0
1
0
or higher, compared with around 11% in SPF mice (Fig. 3c). This was
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7
Position Position confirmed at the Igh sequence level: 11 of 14 mLN germinal centres
b GF1 GF2 GF3 GF4 GF5 GF6 GF7 MM12 1 MM12 2 MM12 3 picked at random from vibratome slices had dominant clones that
C I CC P P P J J J P P I DD J J I I C D J I C P I C P CCC C P
90 accounted for more than 50% of all B cells in the germinal centre, com-
80
pared with 1 of 20 germinal centres in SPF conditions (Fig. 1a–c and
Cells sequenced
70
60
50 Extended Data Fig. 5a–d). Faster selection of Brainbow colours was also
40
30
observed in germ-free Peyer’s patches, where 6 of 9 germinal centres
20 exceeded an NDS of 0.5 by day 20–23 after tamoxifen, compared with
10
0 3 of 20 under SPF conditions (Fig. 3a–c). Germinal-centre selection in
Individual germinal centres
(AID-Confetti)
mLN slices
(wild type)
Whole organs
(wild type)
Individual germinal centres
(AID-Confetti)
Oligo-MM12-colonized mice fell between the SPF and germ-free rates
VH1–47/ VH1-47/ Other VH1-47 VH1-12/ Other VH1–12 All other V (Extended Data Fig. 5e–g). Therefore, selection in gaGCs is not depend-
ARGSNY.../JH4 ARGSNY.../JH2 AREGFAY/JH3 segments
ent on a fully diverse microbiota, and in fact becomes accelerated in
c VH1–47/ARGSNY VH1–12/AREGF(A/D)Y d All public clonotypes Larger expansions the absence of commensal bacteria.
(exact match) (30 total wells or more)
SPF 6 1 Germ-free 6 1 SPF 6 1 Germ-free 6 1
Clonal phylogenies of single-coloured germinal centres from
5 2 5 2 5 2 5 2
germ-free and Oligo-MM12-colonized mice revealed strong clonal
4
4
bursting, as shown by the presence of large expansions of B cells with
3
3
3
3
4
4
identical variable heavy chain (VH) immunoglobulin sequences and
2
2
5
5
1
1
6
6
multiple inferred descendants (Fig. 3d, e and Extended Data Fig. 6a, b,
7
7
1
1
6 6 6 6
Oligo-MM12 1,000 clone*wells Oligo-MM12 1,000 clone*wells sequences that were strongly selected under each condition (includ-
Fig. 4 | Prominent public clonotypes in gaGCs of germ-free and ing those indicated by named arrows in Fig. 3d, e and Extended Data
Oligo-MM12-colonized mice. a, CDR H3 amino-acid sequence logos for B cells Fig. 6a, b; Supplementary Table 1). Of seven monoclonal antibodies
bearing public VH1–47/JH4 or JH2 rearrangements of 11 or 12 amino acids (left) or cloned from Oligo-MM12, three (M216, M218, and M220) bound faecal
VH1–12/JH3 rearrangements of 7 amino acids (right), sequenced from germ-free bacteria fractions from Oligo-MM12-colonized mice (Fig. 3f, i, j), and
gaGCs. b, Frequency across mice or samples of B cells that fit public-clonotype one (M218) bound to cultured Enterococcus faecalis, as assessed by
criteria or carry other VH1–47 or VH1–12 rearrangements. Each bar represents both flow cytometry and dot blotting (Fig. 3g, h). Reversion of somatic
one sample. Data are for 7 germ-free mice (GF1–7) and 3 Oligo-MM12-colonized mutations resulted in larger decreases in binding for M216 and M220
mice (MM12 1–3). D, duodenal mLN; I, ileal mLN; C, caecal-colonic mLN; P, Peyer’s and a modest decrease for M218, as shown by flow cytometry and ELISA
patch. c, Circos plots showing the distribution of VH1–47 and VH1–12 public (Fig. 3h–j). Thus, as with SPF microbiota, vertical colonization with
clonotypes in a second cohort of sex- and age-matched SPF, germ-free and
Oligo-MM12 triggers efficient antigen-driven maturation towards com-
Oligo-MM12-colonized mice. Each segment represents pooled germinal-centre
mensals in steady-state gaGCs.
B cells from the mLNs and Peyer’s patches of one mouse, with clones ordered
We subjected the nine monoclonal antibodies obtained from
clockwise from largest to smallest. Only clones containing identical CDR H3s are
linked (see Methods for a full description). Samples were gated as in Extended
germ-free winner clones to an array of assays that covered major poten-
Data Fig. 9b; all samples were sequenced in a single experiment. d, Circos plots tial sources of antigen, including food, autoantigens (anti-nuclear
as in c, showing all public IgH clonotypes shared across mice housed under the antibody and intestinal tissue antigens), faecal bacteria, and a standard
indicated conditions. Lines connect clonotypes with the same VH/CDR H3/JH polyreactivity panel. None of the germ-free antibodies reacted above
amino-acid sequence. Left, all clones; right, only those clones spanning 30 or background levels in any of these assays (Extended Data Fig. 6c–f). To
more wells in total. In c, d, only clones that were found in at least two wells from determine whether germ-free germinal centres were indeed populated
the same mouse are linked. in a BCR-dependent manner, or simply stochastically owing to a lack
of antigenic stimulation, we searched for commonalities in the Igh
sequences of winner clones, along with additional sequences obtained
(ELISA; Fig. 2c). Only S210 and S212 showed a mild degree of polyreac- from single germinal-centre B cells from mLN vibratome slices and
tivity using standard measures4,17 (Extended Data Fig. 2f). Reversion of whole mLNs and Peyer’s patches of wild-type germ-free mice. This
somatic mutations in S078 and S120 resulted in decreased binding to revealed substantial overlap of clonotype ‘themes’ across individu-
bacteria by both flow cytometry and ELISA (Fig. 2b, d). Thus, commensal als, which we regarded as unlikely to be random given the small pool
binding with the characteristic features of antigen-driven maturation is of germinal centres sampled. Two themes were particularly preva-
detectable in steady-state gaGCs when analysis is focused on strongly lent (Fig. 4a, b). One used the relatively rare VH1–47 segment, coupled
selected gaGC winner clones. to joining segments JH4 or JH2 via an 11–12-amino-acid heavy-chain
To investigate the influence of commensal diversity on gaGC selection complementarity-determining-region 3 (CDRH3) sequence that begins
dynamics, we rederived AID-Confetti mice into germ-free conditions, with the consensus sequence ARGSNY (Fig. 4a). No commonalities in
in which germinal centres still form18, as well as into stable vertical colo- light-chain usage were detected at this sampling depth. After allow-
nization with a consortium of 12 bacterial strains representing major ing a one-amino-acid substitution in the ARGSNY motif, this theme
phyla present in the mouse gut19 (Oligo-MM12; Extended Data Fig. 3). was present in 5 of 7 germ-free mice, accounting for 16.6% of all cells
When compared with SPF mice, germ-free mice had higher frequen- sequenced (Fig. 4b and Extended Data Fig. 7a). Clones with these char-
cies of germinal-centre B cells in jejunal and ileal mLNs20, and lower acteristics represented only 0.00006% of all reads (1 in roughly 17,000)
frequencies in duodenal and caecal-colonic mLNs and Peyer’s patches in a previously published database of naive B cell Igh sequences from
(Extended Data Fig. 4a, b). Germ-free germinal centres were strongly C57BL6 mice, containing more than 30 million reads representing
skewed away from IgG2b and IgA towards IgG1 (Extended Data Fig. 4c). 2.5 million unique rearrangements from 5 mice21 (P < 2.2 × 10−16 com-
Colonization with Oligo-MM12 microbiota partly restored the phenotype pared with germ-free gaGCs).
of SPF mice, increasing germinal-centre B cell frequency in distal Peyer’s A second public clonotype was encoded by the rare VH1–12 segment,
patches and IgG2b in proximal Peyer’s patches (Extended Data Fig. 4d). with the stricter seven-amino-acid consensus CDRH3 sequence, AREG-
Multicolour fate-mapping showed that strongly selected germinal FAY, followed by JH3 (Fig. 4a). Again, no patterns of light-chain usage
centres accumulated at a markedly faster rate in germ-free mice than were identified. Allowing a one-amino-acid substitution in CDRH3, this
Extended Data Fig. 2 | Binding characteristics of ‘winner’ gaGC clones from Fig. 6d. e, Flow-cytometry analysis of the binding of recombinant monoclonal
SPF mice. a, Gating strategy for isolating AID-Confetti single germinal centres antibodies to faecal bacteria isolated from SPF mice. Plots gated as in d. All
shown in b, c, Figs. 2a, 3d, e and Extended Data Figs. 6a, b, 8a. CR, CFP and/or plots are representative of data obtained from at least two separate
RFP; non CR, non-CFP, non-RFP, GY, GFP and/or YFP. b, c, Additional Igh experiments. f, Summary of the reactivity of SPF monoclonal antibodies,
sequence relationships among B cells from high-NDS germinal centres (b) and assayed by ELISA against food protein extracts, autoantigens (anti-nuclear
one low-NDS germinal centre (c) (see Fig. 2a). Scale bars, 50 μm. In c, each tree is antibody, ANA), and a five-antigen polyreactivity panel comprising
for a separate clone (defined as a unique V(D)J rearrangement). Only clones single-stranded DNA, double-stranded DNA, keyhole limpet haemocyanin
with more than five cells are shown (grey slices in pie charts). d, Gating strategy (KLH), insulin and LPS. Shown are background-subtracted OD 450 values. Data
for bacterial flow cytometry, performed in e, Figs. 2b, 3f, h, i and Extended Data representative of assays repeated in at least three separate experiments.
a Quantitative PCR, total 16S b Quantitative PCR, species-specific 16S c Quantitative PCR, species-specific 16S
100 50
SPF Species
10 Oligo-MM12 40 LOD Clostridium clostridioforme YL32
Clostridium innocuum I46
Ct value
30
1 Acutalibacter muris KB18
20 Akkermansia muciniphila YL44
0.1
10 Bacteroides caecimuris I48
Blautia coccoides YL58
0.01 0
Feces Cecum Enterococcus faecalis KB1
6S
6
8
9
KB 1
18
YL 2
27
YL 1
YL 2
YL 4
YL 5
58
Flavonifractor plautii YL31
I4
I4
I4
KB
YL
3
3
4
4
YL
l1
ta
Muribaculum intestinale YL27
To
Lactobaccilus reuteri I49
Turicimonas muris YL45
P F1
Extended Data Fig. 3 | Stable vertical transmission of the Oligo-MM12 In c, Ct values were used to quantify the relative abundance of each species
consortium. a–c, qPCR of total (a) and strain-specific (b, c) 16S DNA from (see Methods). LOD, limit of detection. F1 refers to the first generation after the
faecal samples of mice stably colonized with the Oligo-MM12 consortium. In parental strain (P, colonized by gavage). Note that Bifidobacterium animalis
a, ΔCt values were calculated in respect to a reference SPF sample, marked (YL2) is usually undetectable in faeces19.
by the black filled symbols, with which all other values were compared.
Article
Extended Data Fig. 4 | Frequency and isotype distribution of gaGCs in cells positive for the indicated surface BCR isotype in different organs of mice
germ-free and Oligo-MM12-colonized mice. a, Gating strategy for analysing raised under the indicated conditions. Data are from at least three mice per
the frequency of germinal centres and distribution of isotypes (results shown group, as in d. Data are presented as means ± s.e.m. d, Statistical analysis of
in b–d). b, Frequency of cells with the phenotype of germinal centres (CD38– selected isotypes and anatomical locations, using data from c. Each symbol
FAShi) among total B220+ B cells in the indicated organs of mice raised under represents one mouse. Lines indicate medians; P values are obtained from
the indicated conditions. Each symbol represents one mouse. SPF, n = 25; two-tailed Kruskall–Wallis tests carried out on each trio, with Dunn’s multiple
germ-free (GF), n = 16; Oligo-MM12, n = 11. c, Frequency of germinal-centre B comparisons post-test. All P values below 0.05 are reported.
a Individual GC gating strategy Oligo-MM12 mLN, day 21 post-tamoxifen Peyer’s patch, day 21 post-tamoxifen
TCRβ APC-Cy7
200K 200K Single Cells 4
33.8 10 B cells
FAS PE-Cy7
98.2 3
150K 150K 86.4 10
3 GC
100K 100K 10 4.9
SSC-A
FSC-H
0 0
50K 50K
-103
0 0
0 100K 200K 0 100K 200K 4 4
0 10 0 10
FSC-A FSC-A B220 BV421 CD38 APC
% of fluorescent B cells
50 0.8 80 0.8
Mesenteric LN
40
0.6 60 0.6
30 50
0.4 40 0.4
20
25
10 0.2 20 0.2
0 0.0 0 0.0 0
Dominant clone Expanded clones Singletons
2
F
F
M1
-1
-2
-1
-2
-1
-2
SP
G
14
20
14
20
14
20
M
Colored cell Color Density x dominance PSPFxMM12 = 0.015
c Clonal Clonal Size of d density dominance (density > 0.4 only) PGFxMM12 > 0.999
richness diversity largest clone P < 0.0001 100 100
D50 (fraction of clones accounting
% of fluorescent B cells
80
Clones per GC (Chao1)
80 0.8 0.8 75
% of cells in the GC
2
8
F
M1
mLN GCs mLN GCs mLN GCs
-1
-2
-1
-2
-1
-2
SP
G
14
20
14
20
14
20
(P PF
lic GF
M
)
A)
es
S
Days post-tamoxifen
(s
Extended Data Fig. 5 | Clonal selection in germ-free and Centre bars represent the proportion in the sample; error bars show the exact
Oligo-MM12-colonized mice. a, Gating strategy for germ-free AID-Confetti binomial 95% confidence interval. e, Multiphoton images of Oligo-MM12 mLNs
single germinal centres used in b–d. b–d, Sequencing of Igh genes from B cells and Peyer’s patches at different times after treatment with tamoxifen. Blue
obtained from individual mLN germinal centres. Germinal-centre B cells were represents collagen (second harmonics); white shows autofluorescence; other
single-cell-sorted from fragments of vibratome slices containing single colours are from the Confetti allele. Scale bars, 200 μm (overviews), 50 μm
germinal centres. To avoid biased selection of germinal centres based on NDS (close-ups). N/D, NDS not determined owing to a low density of coloured cells.
or loss of germinal centres with a low density of coloured cells, mLNs were f, Quantification of images as in e for mLNs (top) and Peyer’s patches (bottom).
harvested at five to seven days after treatment with tamoxifen, before Each symbol represents one germinal centre. Medians are indicated. Only
extensive selection or clonal turnover; both fluorescent and non-fluorescent germinal centres with a density of more than 0.4 fluorescent cells per 100 μm2
cells were included in the sample. This unbiased selection ensures that data are are included in the NDS calculations. g, Proportion of germinal centres with
comparable to those obtained using in situ photoactivation (Fig. 1a–d), which NDS values of more than 0.75 in mLNs (top) and more than 0.5 in Peyer’s patches
we could not perform because the photoactivatable GFP-transgenic strain is (bottom) under SPF, germ-free and Oligo-MM12 conditions at 20–23 days
not available under germ-free status. b, Clonal composition of individual after tamoxifen; SPF and germ-free data are as in Fig. 3c. For SPF, Oligo-MM12
germinal centres from five mice (GF1–GF5). C, caecal-colonic mLN; J, jejunal and germ-free mLN gaGCs, n = 57, 16 and 27, respectively; for gaGCs from
mLN. c, Quantification of data from b. Each symbol represents one germinal Peyer’s patches, n = 20, 10 and 9, respectively. P values obtained by two-tailed
centre. d, Proportion of germinal centres in which the largest clone accounts Fisher’s exact tests. Error bars represent exact binomial 95% confidence
for more than 50% of all B cells in mLNs of SPF mice (data from Fig. 1b) and intervals. All data are from three to five mice per time point.
germ-free mice (data from b). P values are from two-tailed Fisher’s exact tests.
Article
a Germ-free UA Oligo-MM12
UA
M220.U
M220
13
7 RFP
2 3 2 2 3 n n cells with
2 identical
2 sequence
2 12
3 Inferred
mLN mLN precursor
86/88 24/44 1 mutation
c GF mAbs: ELISA panel d GF mAbs, flow cytometry on SPF fecal bacteria e GF mAbs, ELISA on SPF fecal bacteria f GF mAbs, WB on SPF ileum protein extract
-)
G082 2.0
9 +)
82 (
G 053
ED ry
3H 38(
G196 G200 2.0
2
G 6
G 8
G 0
G 2
G 4
G 6
G 8
G 6
M 8
2ndary
a
Abs. 450 nm
GF mAbs (n = 9)
23
19
16
20
20
20
20
20
22
22
2 nd
G
G198 1.5
G
G202 ED38
G200 1.5
Abs. 450 nm
1.0 MG053 (- ctrl.) MG038
G202
G204 G204
0.5 1.0
G206 G082
0 G206
G208 0.5
G226 G196 G208
ED38(+) 0.0
G198 G226 10-1 100 101 102 103
MG053(-)
5 5
0 10 3 10 4 10 0 10 3 10 4 10
mAb concentration (nM)
od
ss NA
ds NA
A
In S
lin
H
KL
su
Fo
A
D
D
Extended Data Fig. 6 | Characteristics of ‘winner’ gaGC clones from e, ELISA analysis of the binding of monoclonal antibodies from germ-free mice
germ-free and Oligo-MM12-colonized mice. a, b, Additional Igh sequence to faecal bacterial fractions from SPF mice. MG053 was assayed at three
relationships among B cells from high-NDS germinal centres of germ-free (a) dilutions only. Other monoclonal antibodies were assayed at dilutions
and Oligo-MM12-colonized (b) mice. Details are as in Fig. 2a. Scale bars, 50 μm. indicated on the x-axis. Lines show the means of two assays. f, Western blot
c, Reactivity summary of germ-free monoclonal antibodies assayed by ELISA (WB) analysis of the binding of monoclonal antibodies from germ-free mice to a
against food protein extracts, autoantigens (anti-nuclear antibody, ANA), and a protein extract from mouse ileum tissue, run on a single-well 4–15% gel and
five-antigen polyreactivity panel. Shown are background subtracted OD450 blotted using a multiwell mask. Monoclonal antibody 3H9 is a DNA-specific
values. d, Flow-cytometry analysis of the binding of monoclonal antibodies negative control. Data in c–f are representative of two or more independent
from germ-free mice to faecal bacteria from SPF mice. Details are as in Fig. 2b. experiments.
a b VH1-47 VH1-12
Cell # Cell #
9
Replacements Replacements
45
17
19
31
87
51
59
17
10
9-A . PTPP. 10-E DD.
ARRSN(Y/F)/12 10-E AD. DDD 30-T I . A
11-L . Q. V V I 31-S NNN
Clonotype VH1-47 34-M I VI
12-V . . L . L M
ARGSNY/12 35-H
13-K N. RER. . YY
JH 4 16-A . T T DD. 37-V I . I
19-K . MR R . . 41-P T. S
ARGGFY/11
23-K . R. ER. 43-Q E. K
JH2 28-T . . I I NS 50-A GGV
33-P . R. SF . 52-Y F. H
ARGSNY/11 ARGSSY/11 34-I . V ML . M 55-N NDD
Targeted AA
35-E D. . HD. 58-T A. N
ARGTNY/12 37-M . I L I V. 60-Y N. F
38-K RR. R. . 62-Q P. H
ARGSNF/(11/12) 39-Q . RR. HR 63-K R. Q
40-N . SCSS. 66-G DDD
41-H . L . L P. 67-K M. R
42-G EEEER. 72-V . I A
ARGSNY/12 43-K . EEER. 77-S C. N
50-N SS. SSD 78-T . KK
54-Y . FFFSS 79-A . VV
ARGSNY/12 55-N SDDDDD
0.01 80-Y F. F
57-D EANEE. 83-L
Targeted AA
F VF
58-T I AI AAI 84-S N. T
59-K . NNNNQ
88-S F. F
60-Y . CC. F C
90-D E NE
61-N . S. SDD
94-Y S. S
62-E . . . ADD
63-K MN E N N N
G 5
M F7
12
M1
F
65-K ERRRR.
G
Clonotype VH1-12 66-G . ADDAD
Mouse
AREGFVY 67-K RR. RRR
69-T . S. AA.
VREGFAY 70-L V . . MMM
72-V . . . I AA
73-E . DAD. D >50
Frequency %
74-K . . I . . R 40
TREGFAY 77-S CNNNNN 30
78-T . . . SSK 20
AREGFTY Mouse:
AREGFAH 80-Y . F SNF F 10
AREGFAF GF 1 82-E . DA DGD 0
GF 2 83-L . V. V. V
2
F
M1
G
0.01 GF 3 84-S . C. GG.
M
GF 5 86-L . F. VI S
GF 7 87-T . I . I AI
92-A . . . GDV
MM12 1 93-V . I I I I I
MM12 2 94-Y . C. F F .
95-Y . FFFFF
G 1
G 2
F3
G 5
M F7
32
F
F
F
M1
G
Mouse
Extended Data Fig. 7 | Mutational patterns in germ-free/Oligo-MM12 public clone were included in the analysis. The number of cells analysed per mouse is
clonotypes. a, Dendrograms showing the sequence relationships between indicated at the top of each column. Only those amino acids mutated in at least
VH1–47 and VH1–12 clones in different mice. All clones with up to two-amino-acid three (VH1–47) or two (V H1–12) mice are listed on the left, using Immunogenetics
differences from the public-clonotype CDR H3 motifs are included. b, Heat (IMGT; http://www.imgt.org) numbering; to the right, the most frequent
maps showing the frequency of amino-acid replacements along the VH1–47 and amino-acid replacement in each mouse is given. Arrows indicate recurrent
VHH1–12 families in germ-free (blue) and Oligo-MM12 (green) mice, using the amino-acid mutations found in five of six mice (VH1–47) or three of three mice
same data as in Fig. 4b. Only mice with more than two cells within the specified (VH1–12).
Article
Oligo-MM12 -colonized mouse
VH1-12/AREGFAY/JH3
UA
Cecal mLN
16 PP (M232) Ileal mLN
2 3 4 6 3
16 31
2
12 (M228) 27
3
CFP YFP
3 3
RFP CPF/YFP
n n cells with identical
sequence
Inferred precursor
1 mutation
PFC 1 PFC 2 PFC 3
c
D I C PP I C PP C
b GC frequency 80 VH1-47/ARGSNY.../JH4
VH1-47/ARGSNY.../JH2
10 30 70
SPF VH1-47/ARGSNY.../JH3
Cells sequenced
GF 60 Other VH1-47 H3
8
PFC
50 VH1-12/AREGFAY/JH3
20
% of B cells
6 Other VH1-12
40
Other V segments
4 30
10
20
2
10
0 0 0
D mLN J mLN I mLN C mLN pPP dPP
Whole organs (WT)
Extended Data Fig. 8 | Stereotypical germ-free IgH clonotypes are present organ of origin of cells with that particular sequence. b, Frequency of cells with
in Oligo-MM12 and germ-free/dietary-protein-free conditions. a, Massive a germinal-centre phenotype (CD38dim FAShi) among total B220+ B cells in the
expansion of a public VH1–12 clonotype across different secondary lymphoid indicated organs of mice raised on protein-free chow (PFC). Data for SPF and
organs of mouse MM12 1 (from Fig. 4b), at 21 days after tamoxifen treatment. germ-free mice are reproduced from Extended Data Fig. 4b. Each symbol
Multiphoton images show all three germinal centres sequenced from this represents one mouse. For PFC, n = 8 mice. c, Clonal distribution of germinal-
mouse (yellow dotted boxes), magnified in the side panels. Scale bars, 200 μm centre B cells sequenced from the indicated tissues of three separate mice
(overviews) and 50 μm (close-ups). mLN close-ups are from different image (PFC1–3), with public clonotypes colour-coded. See also Fig. 4b. C, caecal
acquisitions of the same germinal centre. A clonal tree of all cells from this colonic mLN; D, duodenal mLN; I, ileal mLN; PP, Peyer’s patch.
clone is shown at the bottom right. Arrowheads indicate clonal bursts and the
Extended Data Fig. 9 | See next page for caption.
Article
Extended Data Fig. 9 | Multiwell incidence-based Igh sequencing reveals Each symbol represents one well. Boxes represent medians and interquartile
clonal overlap among individual mice and between microbial colonization ranges. As expected, non-germinal-centre B cell samples had many more total
conditions. a, Overview of the incidence-based Igh sequencing method used clones per well than did germinal-centre B cells. d, Proportion of expanded
for c–g and Fig. 4c, d. To identify expanded public clonotypes among gaGC clones (present in more than one well per sample) in germinal-centre and non-
samples from multiple mice with high confidence, we developed an incidence- germinal-centre samples from mLNs and Peyer’s patches of mice held under
based sequencing strategy based on repeated sampling of the same germinal- the specified conditions. e, Histograms showing Levenshtein distances
centre B cell population. We sorted multiple samples of 100 germinal-centre B between the indicated consensus CDR H3 sequence and the CDR H3 sequence of
cells (usually 32 for mLN and 16 for Peyer’s patches) from 6 germ-free, 6 SPF, and all clones in the indicated category. For ARGSNYXXXXDY, distances are plotted
7 Oligo-MM12-colonized mice, and sequenced all BCRs in each sample, for a for clones carrying the ‘correct’ VH1–47 gene or two ‘control’ VH regions with
total of roughly 80 thousand input B cells, plus 32 wells each of non-germinal- similar usage frequency in our sample. P values were obtained by Kruskall–
centre B cells from the mLN of 3 germ-free and 3 SPF mice as controls. To avoid Wallis test comparing all three conditions. Owing to the very low number of
counting as ‘public’ sequences that were spuriously present in different mice total VH1–12 clones outside of the germ-free condition, distances to the
owing to barcode misassignment or DNA contamination, we included in our AREGFAY CDR H3 are compared between VH1–12 clones and all clones. P values
analysis only those clones that were represented by more than five reads in obtained by two-tailed Mann–Whitney U test. f, Fraction of clone*wells
any single well and found in at least two wells from the same sample. Key containing public clonotypes in each condition, pooled from all mice. P values
bioinformatics steps are described in the figure; see Methods for a full were obtained by Fisher’s exact test. g, Venn diagram showing the number of
description of the bioinformatic pipeline. b, Gating strategy used for data in clones per condition (pooled from all mice) and overlap between conditions.
c–g and Fig. 4c, d, described in a. c, Number of distinct clones per well, after The clone in the centre of the graph (SPF/Oligo-MM12/germ-free overlap)
collapsing sequences with matching VH, JH, and CDR H3 nucleotide sequences. corresponds to the VH1–47 public clonotype. In f, g, data are as in Fig. 4d.
nature research | reporting summary
Gabriel Victora
Corresponding author(s): Daniel Mucida
Last updated by author(s): Jul 20, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Data was analysed using FlowJo (TreeStar) v8.7 and v10.5.3, Prism (GraphPad) v8.3.0, ImageJ v1.51 and R v3.6.3. Sequencing analysis was
carried out using PANDASeq v.2.11, HighVQUEST v. 1.6.9, VBASE2, FASTX Toolkit v0.0.13, Change-O v0.4.6, the T-coffee algorithm
(Notredame et. al. 2000) and GCtree v1 (deWitt et. al. 2018). Circular ideogram plots were created using Circos v. 0.69-9. Dendograms
were generated using clustalx v2.1 and FigTree v1.4.4. Data was presented using Adobe Illustrator v23.0.4 and 15.1.0.
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
October 2018
Single cell sequences are available as a supplementary spreadsheet. These refer to Figures 2, S2, 3, 4, 5 and S6 as labeled in the spreadsheet. Incidence-based
sequencing data is available at https://github.com/victoraLab/MIBS.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions Data only excluded for technical reasons. In bacterial flow cytometry data was excluded if the background secondary antibody binding was too
high and no mAb binding could be observed (experiments appear in figures 2, S2 and 4.)
Replication All experiments were reproducible and were repeatable as detailed in figure legends. Sequencing in Figure 5c-g was carried out once, but on
a large sample size (6 or 7 animals per group in 3 groups).
Randomization Mice of either sex were used for most studies. For sequencing experiments in Figure 5c-g age and sex matched mice were used. Mice were
allocated into groups based on genotype and colonization status (not randomized)
Blinding Investigators were not blinded to group allocation; all of the measurements reported objectively quantifiable.
Antibodies
Antibodies used B220- BV421, BioLegend, #103240, RA3-6B2, Lot: B288312, 1:200 dilution
B220- BV605, BioLegend, #563708, RA3-6B2, Lot: 8201934, 1:200 dilution
B220- BV711, BioLegend, #103255, RA3-6B2, Lot: B267109, 1:200 dilution
CD38- APC, BioLegend, #17-0381-82, 90, Lot: 2068810, 1:200 dilution
CD38- PerCP-Cy5.5, BD, #562770, 90, Lot: 9112983, 1:200 dilution
FAS- PE-Cy7, BD, #557653, Jo2, Lot: 9039631, 1:400 dilution
FAS- BV421, BD, #562633, Jo2, Lot: 9029848, 1:400 dilution
CD16/32 (Fc block)- Bio-X-Cell, BE0307, 2.4G2, 1:200 dilution
TCR β- APC-e780, invitrogen, #47-5961-82, H57-597, Lot: 2114197, 1:200 dilution
CD3- BV785, BioLegend, #100232, 17A2, Lot: B277518, 1:200 dilution
CD4- BV785, BioLegend, #100552, RM4-5, Lot: B264992, 1:200 dilution
CD8- BV785, BioLegend, #100749, 53-6.7, Lot: B258589, 1:200 dilution
Nk1-1- BV785, BioLegend, #108749, PK136, Lot: B279624, 1:200 dilution
IgM- PE-Cy7, eBioscence, #25-5790-81, Il/41, Lot: 2039912, 1:200 dilution
October 2018
2
CD35, clone 8C12
S078.U
Validation All fluorescent antibodies validated as described on the manufacturers website. HRP-conjugated antibodies validated in-house
by ELISA measuring full length IgG1 antibody concentration of commercially purchased standards. mAbs produced by us in this
study were validated by SDS-PAGE, ELISA, spectrophotometry (nanodrop) and bio-layer interferometry (Octet Red 96) to ensure
proper expression, folding and concentration.
Authentication Cell lines were not authenticated; validation of functionality was established measuring the quantity and quality of the
produced antibody.
Mycoplasma contamination All cell lines tested negative for mycoplasma contamination.
Rosa26.Confetti (013731) and Rosa26.Stop.TdTomato (007914) mice were from The Jackson Laboratory. AicdaCreERT2 mice
were provided by Claude-Agnès Reynaud and Jean-Claude Weill (Institut Necker). S1pr2CreERT2 BAC transgenic mice provided
by T. Okada (RIKEN Yokohama) and T. Kurosaki (U. Osaka). PA-GFP mice were generated by G. Victora and M. Nussenzweig
(Rockefeller University).
Field-collected samples This study did not involve samples collected from the field.
October 2018
Ethics oversight All animal procedures were approved by the Institutional Animal Care and Use Committee of the Rockefeller University.
Note that full information on the approval of the study protocol must also be provided in the manuscript.
3
Flow Cytometry
Methodology
Sample preparation Cells were isolated by maceration with disposable micropestles (Axygen) in 100 μl of PBS supplemented with 0.5% BSA and 1
mM EDTA (PBE), and single cell suspensions obtained by two passes through a 70 μm mesh. Cells were stained with fluorescently
labeled antibodies on ice for 30 minutes.
Instrument Samples were run on a FACS LSRII or FACS Symphony (BD). For cell sorted samples were run on a FACS ARIA (BD).
Software BD FACSDiva software v8.0.2 was used for flow cytometry data acquisition. Analyzed using FlowJo software package (Tri-Star,
USA) v10.5.3 and v8.7.
Cell population abundance Most cells were single cell index sorted into 96-well PCR plates, with single-cell precision. For bulk sorting, 100 cells per well were
sorted with single-cell precision.
Gating strategy All positive and negative populations were determined by compensation with single color controls. For sorting and analysis, all
lymphocytes were first gated based on SSC-A vs FSC-A, followed by 2 singlet gates (FSC-H vs FSC-A and SSC-H vs SSC-A). For GC
gating, cells were gated on either TCRbeta or Dump-, B220+, CD38-, Fas+ and interrogated for IgM, IgG1, IgG2b or IgA. For AID-
confetti sorting experiments cells were gated on SSC-A vs FSC-A in the same way as GC cells. TCRbeta-B220+CD38-Fas+ cells (GC
B cells) were then plotted as follows: CFP vs RFP, GFP vs YFP and all colored GC cells were single-cell index sorted. For PA-GFP
sorting experiments cells were gated as above for GC cells, then GFP+ (photoactivated) GC cells were single-cell index sorted. For
GF single GC experiments (AID-Confetti), cells were gated as described, but fluorescent and non-fluorescent TCRbeta-B220
+CD38-Fas+ cells were index sorted. For non-fluorescent sequencing analysis, cells were gated as above for GC cells and single-
cell index sorted. For bacterial flow cytometry, cells were gated on SSC-A vs FSC-A with only far outliers removed. Then, SYTO
+DAPI- live bacteria were assayed for mAb binding.
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.
October 2018
4
Article
https://doi.org/10.1038/s41586-020-2772-0 Donald J. Benton1,6 ✉, Antoni G. Wrobel1,6 ✉, Pengqi Xu2,3, Chloë Roustan4, Stephen R. Martin1,
Peter B. Rosenthal5, John J. Skehel1 & Steven J. Gamblin1 ✉
Received: 1 July 2020
Recognition of the ACE2 receptor by the membrane spike glycoprotein resolve ten distinct species of spike and spike–ACE2 complexes (Fig. 1
of SARS-CoV-2 is a major determinant of virus infectivity, pathogenesis and Extended Data Fig. 1), ranging from tightly closed, unbound trim-
and host range. Previous structural studies on the spike glycoproteins of ers to open trimers that formed complexes with three ACE2 molecules
coronaviruses6,16–22 have shown that the spike trimer consists of a central and dissociated monomeric S1–ACE2 complexes. Of the spike trimers
helical stalk—comprising three interacting S2 components—that is cov- analysed, two thirds were bound to ACE2 (Extended Data Fig. 1). Of
ered at the top by S1. Each S1 component consists of two large domains, the unbound species, we observe good-quality particles in the closed
the N-terminal domain (NTD) and receptor-binding domain (RBD), each unbound conformation, equally compact to those reported in our
associated with a smaller intermediate subdomain. In virus membranes, previous study26 and slightly more so than those described in previous
spike glycoproteins exist in a closed form, in which the RBDs cap the reports6,16. There are also considerable numbers (16% of all trimers)
top of the S2 core and are inaccessible to ACE2, and in an open form, of unbound particles with one erect RBD, as well as some (4%) in an
in which one S1 component has opened to expose the RBD for ACE2 intermediate conformation, a less-compact closed form, with a single
binding6,16,18,23. Recent structural studies7,24,25 on the isolated RBD of disordered RBD, which have also been reported in a previous study of
the SARS-CoV-2 spike protein in complex with ACE2 have provided the furin-cleaved spike protein26.
a molecular description of the receptor-binding interface. Although Of the spike trimers bound to the receptor, half accommodate one
some comparisons can be inferred from the previous cryo-electron ACE2 receptor. As previously reported for the SARS-CoV spike pro-
microscopy studies on the spike protein of SARS-CoV12,18,19,23, structures tein12,23, the ACE2-bound RBD occupies a range of tilts with respect to
of intact trimeric SARS-CoV-2 spike with bound ACE2 are needed to the long axis of the trimer (Extended Data Fig. 2a). Of the two RBDs per
determine the effects of binding on the overall spike conformation. trimer that are not engaged with the receptor, either both are closed or
To examine this interaction between the SARS-CoV-2 spike protein one of the RBDs remains closed and one (either clockwise or anticlock-
and its receptor, we mixed the ectodomains of furin-cleaved spike wise to the bound S1 (Extended Data Fig. 1)) is in the open conformation.
with the ectodomains of ACE2 and incubated them for around 60 s We were also able to identify, reconstruct and refine trimers to which
before plunge-freezing the mixture in liquid ethane for examination two or three ACE2 receptors were bound, in successively more open
by cryo-electron microscopy. In the images that we obtained, we could structures (Fig. 1 and Extended Data Fig. 1).
1
Strutural Biology of Disease Processes Laboratory, Francis Crick Institute, London, UK. 2Precision Medicine Center, The Seventh Affiliated Hospital, Sun Yat-sen University, Shenzhen, China.
3
Francis Crick Institute, London, UK. 4Structural Biology Science Technology Platform, Francis Crick Institute, London, UK. 5Structural Biology of Cells and Viruses Laboratory, Francis Crick
Institute, London, UK. 6These authors contributed equally: Donald J. Benton, Antoni G. Wrobel. ✉e-mail: donald.benton@crick.ac.uk; antoni.wrobel@crick.ac.uk; steve.gamblin@crick.ac.uk
ACE2
Two bound
One bound
One bound
One erect RBD
Comparison of the trimers with one erect RBD that is either bound approximately 5.5 Å away from the trimer axis, the NTD-associated and
or unbound by an ACE2 receptor revealed two things. First, ACE2 RBD-associated subdomains of the same monomer shift around 1.9 Å
binding alters the position of the open RBD by a rigid-body rotation and about 2.3 Å, respectively (Extended Data Fig. 2c), and at the same
of the domain that moves its centre of mass on average a further time the NTDs of all three S1 components move by around 1.5–3.0 Å
b a
R634
Y837 S2
Y636 F318
P295
S1
W633 K854
D614
RBD
Putative FP
R815 NTD
8Å
Closed S1
c
R634
W633 F318 S2
Unfolded
827–855 3Å
Y636 P295
D614
S2
Putative FP
R815
ACE2-bound S1
Fig. 2 | Structural rearrangements between the closed and the ACE2-bound moiety of the S2 chain that it interacts with (in red) in the closed conformation
states of the spike protein. a, Surface representation of a monomer of S2 in of the spike. Essential residues that participate in the interaction are labelled;
the one-ACE2-bound, two-RBD-closed state coloured in light pink with the S1 of particular note is the salt bridge between Asp614 (S1, chain A) and Lys854
subunit of the adjacent monomer in ribbon representation; the S1 of the (S2, chain B). c, Ribbon representation of the same intermediate domain as
one-ACE2-bound, two-RBD-closed state is shown in green and the three-RBD- in b, but in the conformation observed in the ACE2-bound structure of the spike
closed state (PDB 6ZGE 26) is shown in blue. The atoms on the surface of S2 that (in green), in which the movement and refolding of the domain leads to a loss of
contact the S1 intermediate domains are coloured in red. The arrows indicate interaction with S2, which becomes disordered. The putative fusion peptide
the direction of movements of the intermediate domains, and of the RBD, (FP) and the S2′ site of the second protease cleavage at R815 adjacent to the
between the closed and ACE2-bound conformations of the spike. b, Ribbon region that undergoes unfolding are shown in dark red.
representations of the NTD-associated intermediate domain in blue and the
Extended Data Fig. 1 | Surface representation of obtained structures. The three monomers of S in each trimer are coloured in blue, rosy brown and gold with
ACE2 shown in green. Relative percentages of all trimeric S particles used to calculate electron microscopy maps are shown.
Extended Data Fig. 2 | Features of the obtained spike structures. a, Two structure (purple) with the one-ACE2-bound structure (orange). c, S1 domains
three-dimensional classes, obtained by further classification of the shown to highlight domain shifts of the RBD and RBD-associated intermediate
one-ACE2-bound closed state from Fig. 1, representative of the range of domain. d, Outwards movements of spike domains (excluding RBDs).
motion of the RBD with bound ACE2, tilting away from the trimer axis of the e, Comparison of RBD displacements of one-bound, two-bound and
spike trimer. The tilt of the RBD and ACE2 is indicated with a dashed line. three-bound RBDs after binding of ACE2 to the unbound open structure of the
b, Representative density of different obtained electron microscopy maps for spike protein (beige). These are compared to the RBD displacement after
residues 996–1030 of S2. Built model shown in pink, with EM density shown as a binding of the C105 Fab fragment 27, which binds at the ACE2 interface of the
mesh. c, d, Comparison of spike structures for the open one-erect-RBD RBD (PDB: 6XCM).
Article
Extended Data Fig. 3 | Cryo-electron microscopy data processing scheme. final maps shown at the bottom. The global resolution, final particle number
Classes of particles used to obtain the final spike trimer structures, unbound and percentage for each trimer species are shown at the bottom.
and in complex with ACE2, are surrounded by a box of the same colour as the
Extended Data Fig. 4 | Monomeric S1 bound to ACE2. a, Classification particles. Domains are coloured as follows: green, ACE2; yellow, NTD; rosy
scheme for the S1–ACE2 complex. b, c, Maps are shown of orthogonal views of brown, RBD; pink, RBD ganymede; blue, NTD ganymede; cream, disseminated
the non-uniform refinement (b) and unmasked refinement (c) of the final S1 density in b.
Article
Extended Data Fig. 5 | Fourier shell correlation graphs for each of the determined structures. FSC, Fourier shell correlation.
Extended Data Fig. 6 | Maps and models of determined structures. Top, orthogonal views of electron microscopy density (grey) and ribbon diagram
representation of the models. Bottom, electron microscopy maps coloured by local resolution shown below.
Article
Extended Data Table 1 | Buried interface surface area between monomers in different conformations
Different confirmations of unbound and ACE2-bound trimers were analysed. The interface area was calculated using PISA. In the open and ACE2-bound conformations, chain A is the one to
open first and to bind the receptor first, then B follows, if the second RBD changes the conformation. Chain B is the chain anticlockwise to A when looking down the symmetry axis with the
membrane-proximal part at the bottom. The unbound and three-ACE2-bound molecules are of C3 symmetry.
Extended Data Table 2 | Cryo-electron microscopy data collection, refinement and validation statistics
nature research | reporting summary
Donald Benton, Antoni Wrobel, Steven
Corresponding author(s): Gamblin
Last updated by author(s): Aug 30, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis CryoEM data processed using following packages: RELION-3.1, cryoSPARC v2.14, CTFfind4 v.4.1.10, MotionCor2 v.1.2.6, crYOLO v1.4,
Coot v.0.9, PHENIX v.1.17, UCSF Chimera v.1.12, UCSF ChimeraX v.0.5, CCP4MG v2.10, PISA v1.52
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
Maps and models have been deposited in the Electron Microscopy Data Bank, http://www.ebi.ac.uk/pdbe/emdb/ and the Protein Data Bank, https://
October 2018
www.ebi.ac.uk/pdbe/ with the following accession codes: EMD-11681 and PDB 7A91 (Dissociated S1 domain bound to ACE2 [Non-Uniform Refinement]);
EMD-11682 and PDB 7A92 (Dissociated S1 domain bound to ACE2 [Unmasked Refinement]); EMD-11683 and PDB 7A93 (SARS-CoV-2 S with 2 RBDs Erect);
EMD-11684 and PDB 7A94 (SARS-CoV-2 S with 1 ACE2 Bound); EMD-11685 and PDB 7A95 (SARS-CoV-2 S with 1 ACE2 Bound and 1 RBD Erect in Clockwise
Direction); EMD-11686 and PDB 7A96 (SARS-CoV-2 S with 1 ACE2 Bound and 1 RBD Erect in Anticlockwise Direction); EMD-11687 and PDB 7A97 (SARS-CoV-2 S with
2 ACE2 Bound); EMD-11688 and PDB 7A98 (SARS-CoV-2 S with 3 ACE2 Bound).
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions CryoEM single particles were included and excluded within the image processing workflow using standard image processing techniques such
as 2D and 3D classifications, as detailed in Extended Data Figures 3 and 4.
Replication Structures were determined using independent half datasets, according to standard procedures in cryoEM. Images were collected from three
independent replicate prepared grids, which all produced similar images both by low resolution visual inspection and high resolution class
averages. There were no unsuccessful replications.
Randomization Not applicable to this study, as samples were not assigned to experimental groups and data were collected and processed according to
standard techniques for cryoEM.
Blinding Not applicable to this study, as there was no experimental group allocation in data collection and analysis.
Mycoplasma contamination Cell line was not tested for mycoplasma contamination
Commonly misidentified lines Name any commonly misidentified cell lines used in the study and provide a rationale for their use.
(See ICLAC register)
October 2018
2
Article
Check for updates Most deaths from cancer are explained by metastasis, and yet large-scale metastasis
research has been impractical owing to the complexity of in vivo models. Here we
introduce an in vivo barcoding strategy that is capable of determining the metastatic
potential of human cancer cell lines in mouse xenografts at scale. We validated the
robustness, scalability and reproducibility of the method and applied it to 500 cell
lines1,2 spanning 21 types of solid tumour. We created a first-generation metastasis
map (MetMap) that reveals organ-specific patterns of metastasis, enabling these
patterns to be associated with clinical and genomic features. We demonstrate the
utility of MetMap by investigating the molecular basis of breast cancers capable of
metastasizing to the brain—a principal cause of death in patients with this type of
cancer. Breast cancers capable of metastasizing to the brain showed evidence of
altered lipid metabolism. Perturbation of lipid metabolism in these cells curbed brain
metastasis development, suggesting a therapeutic strategy to combat the disease and
demonstrating the utility of MetMap as a resource to support metastasis research.
Human cancer cell lines have been a driving force in cancer research, as a pool into the left ventricle of 5–6-week-old NOD-SCID-gamma
leading to the discovery of oncogenic mechanisms and therapeutic (NSG) mice so as to focus our analysis on the ability of tumour cells
targets1–4. However, large-scale characterization of cell lines has been to exit circulation and undergo expansion in distant organs. Biolu-
limited to rudimentary readouts such as viability in cell culture, because minescence imaging (BLI) revealed metastatic lesions throughout
more complex phenotypes—such as behaviours in vivo—have not been the body (Extended Data Fig. 1b). Five weeks after injection, brain,
tractable at scale. By contrast, most studies of metastasis rely on only lung, liver, kidney and bone were collected, human tumour cells were
a small number of experimental models5–9, thereby making it difficult isolated by fluorescence-activated cell sorting (FACS) using GFP
to extrapolate findings to genetically diverse human tumours10. or mCherry, and barcodes were quantified using RNA sequencing
Ideally, it would be possible to construct a map of organ-specific (RNA-seq) (Extended Data Fig. 1c–g). Whereas barcode abundances
metastatic potential of hundreds of human cancer cell lines using were similar pre-injection, some barcodes were enriched in specific
xenograft models, so that the molecular features of the cell lines could organs (Extended Data Fig. 1h). Different cell lines exhibited distinct
be related to their ability to survive and proliferate in organ-specific patterns of metastatic spread, but each cell line showed highly similar
microenvironments. However, the prospect of in vivo testing of each pattern of spread across multiple mice independent of whether GFP
cell line individually is unattractive, because it is labour-intensive and or mCherry versions were used, demonstrating the reproducibility of
expensive, as well as because of the difficulty in sufficiently controlling this pooled approach (Extended Data Fig. 1d). For example, HCC1954
for variability between animal experiments. We proposed that if cell was most strongly detected in brain, whereas extracranial metastases
lines were labelled with molecular barcodes and injected into recipi- were dominated by MDAMB231. Barcodes quantified by bulk RNA-seq
ent mice as a pool, internally controlled, metastatic potential could be were independently validated by quantitative PCR with reverse tran-
assessed in a highly scalable manner. scription (RT–qPCR) and single-cell RNA-seq (Extended Data Fig. 1i–m,
Supplementary Note 1).
Having validated the method, we next characterized the metastatic
Pilot study with breast cancer behaviours of all 21 basal-like breast cancer cell lines in the Cancer Cell
To test the feasibility and reliability of in vivo barcoding to monitor Line Encyclopedia (CCLE) (Extended Data Fig. 1a–d). Basal-like breast
growth in different tissues in mice, we performed a pilot study using cancers are known to have diverse metastatic abilities in patients11.
four breast cancer cell lines (Fig. 1a, Extended Data Fig. 1, Supple- Reflecting this diversity, the cell lines showed disparate metastatic pat-
mentary Note 1). Each cell line was engineered to express a unique terns: pan-metastatic, metastatic preferentially to particular organs or
26-nucleotide barcode, together with luciferase for in vivo imaging not metastatic (Fig. 1b, Supplementary Table 2). Notably, one cell line
and either GFP or mCherry to facilitate subsequent cell sorting and (BT20) was detected in multiple organs, but at very low abundance in all
measurement of reproducibility within a single mouse (Extended Data of them, reflecting its ability to colonize but not expand. To validate the
Fig. 1a, Supplementary Table 1). The 8 barcoded lines were injected patterns of metastasis observed in the pooled in vivo system, we selected
1
Broad Institute of MIT and Harvard, Cambridge, MA, USA. 2Koch Institute for Integrative Cancer Research, Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
3
Edwin L. Steele Laboratories, Department of Radiation Oncology, Massachusetts General Hospital, Boston, MA, USA. 4Institute for Medical Engineering and Science, Picower Institute for
Learning and Memory, Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA. 5Harvard Medical School, Boston, MA, USA. 6Dana-Farber Cancer
Institute, Boston, MA, USA. ✉e-mail: xjin@broadinstitute.org; golub@broadinstitute.org
Lun
Bo
100 system. Cell lines derived from metastases showed higher metastatic
g
1
Ki
dn
ey Liv
e r 0 potential than lines derived from primary tumours, although lines
Organ Metastatic potential Penetrance derived from primary tumours known to later give rise to metastases
Fig. 1 | Scalable in vivo metastatic potential mapping with barcoded cell
in patients were metastatic in the MetMap (Fig. 3b), consistent with
line pools. a, A schematic of the experiment determining the feasibility of previously reported suggestions that metastatic potential is already
in vivo metastatic potential profiling using barcoded cell line pools. Barcode encoded in primary tumours17–19. The association between decreased
abundance reflecting cancer cell compositions was determined by RNA-seq, metastatic potential and increased patient age was unexpected (Fig. 3c),
and the cell number of each cell line was inferred by cancer cell composition and its basis remains to be determined.
and total cancer cell counts isolated from the target organ. b, Petal plots Perhaps most importantly, extensive variation in metastatic potential
displaying the metastatic patterns of 21 basal-like breast cancer cell lines. Petal was observed within individual lineages, making it possible to search
length represents metastatic potential, quantifying the mean of inferred for associations between metastasis propensity and genomic features
cancer cell numbers detected from the target organs. Data are mean ± 95% of the tumours. Of note, metastatic potential was not simply explained
confidence interval. Petal width shows penetrance, quantifying percentage of by proliferation rate or mutational burden (Fig. 3f–h, Extended Data
mice detected with the cell line.
Fig. 4f, g), suggesting that more subtle molecular determinants of
metastasis were involved.
eight cell lines for individual characterization, and observed similar
results from the pooled and individual screens (Extended Data Fig. 1n, o).
Molecular correlates of brain metastasis
To develop mechanistic insights, we focused on breast cancer and its
A metastasis map of 500 human cancer cell lines potential for brain metastasis (Fig. 1b), because brain metastasis is a
Having demonstrated its feasibility in breast cancer, we attempted feature of some—but not all—breast cancers, and little is known about
to expand the mapping of metastatic potential to human cancer cell the underlying factors that could inform therapeutic approaches20,21.
lines from diverse lineages. To facilitate higher-throughput profil- We therefore undertook a systematic and unbiased comparison of
ing, we used cell lines barcoded for use with the PRISM method, which the molecular features that distinguished brain metastatic versus
was developed for in vitro drug-sensitivity screening12. A simplified non-metastatic lines, using genomic data available for each of the
workflow enabled the quantitative detection of barcodes from crude cell lines.
tissue lysates without the need for FACS-based tumour cell purification At the level of somatic mutations, PIK3CA was the top associated cor-
(Extended Data Fig. 2, Supplementary Note 2). We applied this method relate: 4 out of 7 brain metastatic lines contained a PIK3CA mutation,
to 503 cell lines spanning 21 lineages to develop a first-generation compared with 0 out of 14 non-metastatic or weakly metastatic lines
Metastasis Map (MetMap) (Fig. 2a). The data and interactive visualiza- (false discovery rate (FDR) = 0.0034) (Fig. 4a, Extended Data Fig. 5a).
tion are publicly accessible at https://pubs.broadinstitute.org/metmap. A fifth line, HCC70, has a loss-of-function mutation in PTEN. PI3K is a prin-
To test the robustness of the MetMap dataset, we tested cell lines in cipal downstream mediator of ERBB2 (also known as HER2), which itself
two formats: in one, we injected all 498 cell lines as a single pool; in the has been reported to be associated with brain metastasis in humans11,20.
other, we injected 5 pools of 25 lines, with each pool being injected into Indeed, two of the brain metastatic cell lines ( JIMT1 and HCC1954) also
different mice (referred to as MetMap500 and MetMap125, respectively) contain typical ERBB2 gene amplifications (Extended Data Fig. 5a).
(Fig. 2b). We similarly varied cell numbers, mouse age and cohort size to At the level of DNA copy number, we observed an association between
determine whether results varied substantially with these parameters. metastatic potential and deletions of chromosome 8p12–8p21.2
We observed strong correlation of the metastatic potential despite (referred to as 8p) (FDR = 0.0017) (Fig. 4b). Five out of seven brain
differences in experimental conditions (Fig. 2c), suggesting that the metastatic breast cancer cell lines contained deletions in this region,
approach is extremely robust. We also note that intracardiac injec- compared with 0 out of 14 non-metastatic lines (Extended Data Fig. 5a).
tion enabled the evaluation of many more cell lines in vivo compared A sixth metastatic line, JIMT1, has a small deletion within this commonly
with subcutaneous injection. Specifically, we recovered an average of deleted region.
197 cell lines per mouse following intracardiac injection, whereas an To ascertain the clinical relevance of these associations, we analysed
average of 42 cell lines were recovered following subcutaneous injection clinical breast cancer datasets for which metastasis information was
(Extended Data Fig. 3a–c). We suspect that this difference is explained available18. We observed a strong correlation between 8p copy number
by the local competition for nutrients and other microenvironmental and gene expression in the METABRIC and TCGA datasets22,23 (Extended
factors in the subcutaneous setting, whereas the spatial separation Data Fig. 6a), thereby validating 8p expression as a surrogate for copy
of tumour cells delivered through the intracardiac route minimizes number in datasets for which copy number data were not available.
Neuroblastoma
Head and neck
Mesothelioma
Endometrium
~500 cells per line ~10k cells per line
Oesophageal
Pancreatic
15 mice ~5 mice per pool
Melanoma
Colorectal
Sarcoma
Bile duct
Prostate
Bladder
Ovarian
8–10 weeks old, female 5–6 weeks old, female
Thyroid
Gastric
Kidney
Breast
Bone
Brain
Lung
5 organs 4 organs
Liver
120 cell lines
Primary 4 organs
shared
tumour
Metastasis
Brain
A2780
HEYA8
102
SKHEP1
Penetrance HCC1806 DMS273
0% LU99
20% A673
SNU407 MELHO
40%
SW620
KP4 Pearson’s r = 0.75
60% HSC3
MELJUSO
80% P < 2.2 × 10–16
SKMEL5 A549
100% SNU869
8505C
ASPC1
Lung
G361 MHHES1
CAL62 UACC62
100 LN229
CORL23
HS766T SW480
YAPC OVCAR8
HMC18 S117 KYSE510 NCIH841
MetMap500
JIMT1 YD15
SKMEL24 HEC1A
DU145 MDAMB435S
YKG1 HARA
RKO
SNU840 HS294T PC14 Pearson’s r = 0.84
RMUGS MESSA P < 2.2 × 10–16
NCIH322 RERFLCAD2
HT144 SNU1041
KYSE30 KYSE410 SNUC2A
T24 NCIH1437
Liver
AN3CA HCC827
SH4 SCABER NCIH1703
NCIH2172 AGS L33
10−2 647V NCIH1975
NCIH2030
ISTMES1ISHIKAWAHERAKLIO02ER
J82 NCIH1355 YD10B 22RV1
G401 HLF
GB1 LS411N MIAPACA2
KURAMOCHI Pearson’s r = 0.60
U251MG CAL12T CAPAN2
SF295
KYSE70
OE19 P = 5.1 × 10–13
CAL78 SNU719
YH13 NCIH1339 NCIN87
SNU761
G292CLONEA141B1 UBLC1NCIH1623 KNS62 SW837
CAL54 SNU601 PANC0327 U2OS
Bone
Fig. 2 | Drafting MetMap for 500 human cancer cell lines. a, A schematic of primary tumour or metastasis. b, Comparison of experimental conditions
the workflow using pan-cancer PRISM cell line pools for high-throughput between MetMap500 and MetMap125. c, Scatter plots showing overall and
metastatic potential profiling. Relative metastatic potential was quantified by organ-specific metastatic potential as determined in MetMap500 and
deep sequencing of PRISM barcode abundance from tissue. The cancer lineage MetMap125. Strong correlation is observed between the two experiments.
distribution of the profiled 500 cancer cell lines is presented, with each dot Each dot represents a cell line. Cancer lineage is colour-coded as in a.
representing a cell line, and showing whether the cell line was derived from
100
100 100
10–2 10–2 esis, we repeated the experiment in culture medium prepared with
10–4 10–4
delipidated serum, which prevented the increase in TAGs observed in
Primary tumour Metastasis Primary tumour Primary with
metastasis
Metastasis
SREBF1-knockout cells (Extended Data Fig. 7).
c Age: P = 0.0077
102
To further explore the role of SREBF1, we performed RNA-seq fol-
lowing SREBF1 knockout and found SCD35 to be the most consistently
Metastatic
potential
100
10–2
downregulated gene (Fig. 4i). Consistent with this, SCD was the top
10–4
co-dependency of SREBF1 across 688 cell lines in the genome-wide
0–10 10–20 20–30 30–40 40–50 50–60 60–70 70–80 80–90 CRISPR–Cas9 viability screens (Fig. 4j). The next highest scoring SREBF1
d Gender: P = 0.55 e Ethnicity: P = 0.91
co-dependency was SCAP, which encodes the upstream activator of
102 102
SREBF135. Comparison of gene expression in breast cancer cells grown
Metastatic
potential
100 100
in vitro or in the brain similarly showed that in the brain, cells adopted
10–2 10–2
gene-expression signatures of adipogenesis, fatty acid metabolism
10–4 10–4
Male Female Asian African american Caucasian and xenobiotic metabolism (Extended Data Fig. 8, Supplementary
f Doubling time (h): P = 0.058 g Mutation burden: P = 0.52 h Aneuploidy: P = 0.23
Note 3). The enrichment of lipid-metabolism signatures (including
102 102 102
upregulation of SREBF1 and SCD) was unique to brain compared with
other sites of metastasis. Similar upregulation was also observed in
Metastatic
potential
−log10(P value)
Hallmark: PI3K–AKT–MTOR signaling
−log10(P value)
4
GO: ERBB signalling pathway
4
GO: ERBB2 signalling pathway
2 GO: carnitine metabolic process
KRAS 2
TP53 Reactome: mitochondrial fatty acid β-oxidation
BRCA1 BRAF
GO: short chain fatty acid metabolic process
0 BRCA2 0
10–5 Depleted
0 1,000 2,000 0 5,000 10,000
Gene ranks Gene ranks
st
t
rg tes
Sk inte
hi at
t
fa
La l in
us
Sp o c
W nf
Ki al
en
ey
us
Th s
te
e
ym
G t
tr
rta
e
n
ow
al
ng
Ad r
r
re
d e
dn
le
le
in
st
ve
ea
as
ai
Metabolite
Sm
Ao
So
Lu
Br
Br
Te
Li
H
Enriched 10–3 log2(FC) CE
PPP PPP DAG
LPC
CE CE LPE
PC PC PC
Lipid species
PE
SM SM SM
LPC LPC
LPE LPE
TAG
DAG DAG
TAG TAG
10–3 Depleted Abundance z-score –2 2
Brain met Non/weakly
f CRISPR dependency Brain Lung Liver g metastatic
SREBF1 HCC1806
104 JIMT1 HCC1954 SREBF1
CRISPR-dependency
Metastatic potential
6
MDAMB231
103
−log10(P value)
4
102 Kidney Bone
2 SREBF2
101
0 100
0 5,000 10,000 15,000 –3 –2 –1 0 SREBF1 gene effect –3 –2 –1 0 1
Gene ranks Gene effects
h Lipidomics: SREBF1-KO vs WT i RNA-seq: SREBF1-KO vs WT j SREBF1 co-dependency
Enriched 10–3 6
SREBF1 SCD
TAG TAG SCD
Ceramide Ceramide 75
−log10(P value)
Fig. 4 | An altered lipid-metabolism state associates with brain metastatic diacylglycerol; PPP, pentose phosphate pathway metabolites. e, Heat map
potential in basal-like breast cancer. a, Somatic mutations that associate presenting distribution of lipid species measured by mass spectrometry from
with brain metastatic potential in the basal-like breast cancer cohort. The top different mouse tissues. Gastroc, gastrocnemius. f, CRISPR gene dependencies
correlate, PIK3CA, reaches statistical significance (FDR = 0.0034, highlighted that associate with brain metastatic potential. The top gene, SREBF1
in bold). All PIK3CA mutations are activating. Positive correlations are in red, (FDR = 0.001), is a selective dependency in highly brain metastatic lines.
negative correlations are in blue. Selected known oncogenes or tumour Positive correlations are in red, negative correlations are in blue.
suppressors in basal-like breast cancer are presented for comparison. g, Distribution of SREBF1 (top) and SREBF2 (bottom) dependencies across 688
b, Alterations in copy number that associate with brain metastatic potential. human cancer cell lines. The positions of highly brain metastatic (met) breast
The top correlates cluster in chr 8p12–8p21.2 (FDR = 0.0017, highlighted in lines are highlighted in red, whereas weakly metastatic or non-brain metastatic
bold). c, Gene-expression signatures that associate with brain metastatic breast lines are highlighted in blue. h, Consensus alterations in lipid species
potential. Bars indicate P values. Expression signature scores were projected abundance upon SREBF1 knockout (KO) in JIMT1 and HCC1806, two brain
for each cell line with their in vitro RNA-seq data and used for regression metastatic cell lines. Bars indicate adjusted P values. Lipid metabolites
analysis. GO (Gene Ontology), Hallmark, Reactome and Burton are gene sets in measured by mass spectrometry were grouped by species, and enrichment
the MSigDB gene set enrichment analysis (GSEA) collection. d, Lipid-metabolite analysis of the species was performed using GSEA. WT, wild type. i, Consensus
species that associate with brain metastatic potential. Bars indicate P values. gene-expression changes upon SREBF1 knockout in JIMT1, HCC1806, HCC1954
Lipid metabolites measured by mass spectrometry were grouped by species, and MDAMB231, four brain metastatic cell lines. The two top genes are SREBF1
and enrichment analysis of the species was performed using GSEA. CE, and SCD (FDR <0.05, highlighted in bold). j, Co-dependencies of SREBF1 across
cholesterol ester; PC, phosphatidylcholine; SM, sphingomyelin; LPC, 688 human cancer cell lines in genome-wide CRISPR viability screen. The two
lysophosphatidylcholine; LPE, lysophosphatidylethanolamine; DAG, top genes are SCD and SCAP (FDR < 1 × 10 −60, highlighted in bold).
–log10 (adj. P)
...
...
... 6
cancer cells. Nature 483, 570–575 (2012).
UBIAD1 IRX3 4 4. Behan, F. M. et al. Prioritization of cancer therapeutic targets using CRISPR–Cas9 screens.
SCD SCAP
ACLY 2 Nature 568, 511–516 (2019).
Guide dropout Guide amplification Intracranial
SREBF1 5. Kang, Y. et al. A multigenic program mediating breast cancer metastasis to bone. Cancer
quantification from brain lesion injection
–5 –2.5 0
Cell 3, 537–549 (2003).
c JIMT1 intracranial Effect size 6. Chen, S. et al. Genome-wide CRISPR screen in a mouse model of tumor growth and
metastasis. Cell 160, 1246–1260 (2015).
107 WT SREBF1-KO SCAP-KO SCD-KO ACLY-KO PMVK-KO IRX3-KO
7. Malladi, S. et al. Metastatic latency and immune evasion through autocrine inhibition of
WNT. Cell 165, 45–60 (2016).
BLI
104
8. van der Weyden, L. et al. Genome-wide in vivo screen identifies novel host regulators of
g1 g2 g1 g2 g1 g2 g1 g2 g1 g2 g1 g2 g1 g2
107 metastatic colonization. Nature 541, 233–236 (2017).
106 9. Tasdogan, A. et al. Metabolic heterogeneity confers differences in melanoma metastatic
105 potential. Nature 577, 115–120 (2020).
104 10. Robinson, D. R. et al. Integrative clinical genomics of metastatic cancer. Nature 548,
103 297–303 (2017).
0 10 20 30 40 11. Kennecke, H. et al. Metastatic behavior of breast cancer subtypes. J. Clin. Oncol. 28,
Days post injection
3271–3277 (2010).
d JIMT1 intracardiac e JIMT1 intracarotid
Brain Lung Liver Bone Whole body
12. Yu, C. et al. High-throughput identification of genotype-specific cancer vulnerabilities in
mixtures of barcoded tumor cell lines. Nat. Biotechnol. 34, 419–423 (2016).
13. Budczies, J. et al. The landscape of metastatic progression patterns across major human
BLI
BLI
196-fold 10-fold 21-fold 9-fold 9-fold 111-fold 15. Fonkem, E., Lun, M. & Wong, E. T. Rare phenomenon of extracranial metastasis of
100 100
glioblastoma. J. Clin. Oncol. 29, 4594–4595 (2011).
10–1 10–1
10–2
16. Stone, K. R., Mickey, D. D., Wunderli, H., Mickey, G. H. & Paulson, D. F. Isolation of a human
10–2
10–3 10–3
prostate carcinoma cell line (DU 145). Int. J. Cancer 21, 274–281 (1978).
WT KO
WT KO 17. Ramaswamy, S., Ross, K. N., Lander, E. S. & Golub, T. R. A molecular signature of
metastasis in primary solid tumors. Nat. Genet. 33, 49–54 (2003).
Fig. 5 | Investigation of lipid-metabolism genes in breast cancer brain 18. Zhang, X. H.-F. et al. Selection of bone metastasis seeds by mesenchymal signals in the
metastasis. a, A schematic of an in vivo CRISPR screen investigating relative primary tumor stroma. Cell 154, 1060–1073 (2013).
19. Campbell, P. J. et al. The patterns and dynamics of genomic instability in metastatic
gene fitness in brain metastasis outgrowth. b, Volcano plot showing the result
pancreatic cancer. Nature 467, 1109–1113 (2010).
of a mini-pool in vivo CRISPR screen targeting 29 lipid-metabolism-related 20. Witzel, I., Oliveira-Ferrer, L., Pantel, K., Müller, V. & Wikman, H. Breast cancer brain
genes. Thirteen genes scored at FDR < 0.05, with selective hits highlighted. metastases: biology and new clinical perspectives. Breast Cancer Res. 18, 8 (2016).
c, Individual gene validation of six hits by intracranial injection of JIMT1 edited 21. Kodack, D. P., Askoxylakis, V., Ferraro, G. B., Fukumura, D. & Jain, R. K. Emerging strategies
for treating brain metastases from breast cancer. Cancer Cell 27, 163–175 (2015).
cells. Cell outgrowth in brain metastasis was monitored by real-time BLI. Two
22. Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours
independent guides per gene were tested, with one guide per-mouse. d, BLI and reveals novel subgroups. Nature 486, 346–352 (2012).
quantification of relative fold change in metastasis load in the organs of mice 23. Cancer Genome Atlas Network. Comprehensive molecular portraits of human breast
receiving intracardiac injection of wild-type (WT) or SREBF1-knockout (KO) tumours. Nature 490, 61–70 (2012).
JIMT1 cells. Data are mean ± s.e.m. Each group contains five mice. e, BLI and 24. Razavi, P. et al. The Genomic Landscape of Endocrine-Resistant Advanced Breast
Cancers. Cancer Cell 34, 427–438 (2018).
quantification of relative fold change in brain metastasis load in mice receiving 25. Gatza, M. L. et al. A pathway-based classification of human breast cancer. Proc. Natl Acad.
intracarotid injection of wild-type or SREBF1-KO JIMT1 cells. Data are Sci. USA 107, 6994–6999 (2010).
mean ± s.e.m. n = 7 (wild-type) and n = 8 (knockout) mice. 26. Creighton, C. J. et al. Proteomic and transcriptomic profiling reveals a link between the
PI3K pathway and lower estrogen-receptor (ER) levels and activity in ER+ breast cancer.
Breast Cancer Res. 12, R40 (2010).
large repertoire of models for exploration of metastasis mechanisms. 27. Ricoult, S. J. H., Yecies, J. L., Ben-Sahra, I. & Manning, B. D. Oncogenic PI3K and K-Ras
stimulate de novo lipid synthesis through mTORC1 and SREBP. Oncogene 35,
A limitation of the use of human cell lines for such experiments is that 1250–1260 (2016).
they require the use of immunodeficient mice. The extent to which the 28. Cai, Y. et al. Loss of chromosome 8p governs tumor progression and drug response by
immune system has a role in mediating patterns of metastasis remains altering lipid metabolism. Cancer Cell 29, 751–766 (2016).
29. Li, H. et al. The landscape of cancer cell line metabolism. Nat. Med. 25, 850–860 (2019).
to be determined37. 30. Patra, K. C. & Hay, N. The pentose phosphate pathway and cancer. Trends Biochem. Sci.
We followed up only a small proportion of the MetMap findings— 39, 347–354 (2014).
specifically, breast cancer metastasis to brain. Multiple lines of experi- 31. Jain, M. et al. A systematic survey of lipids across mouse tissues. Am. J. Physiol.
Endocrinol. Metab. 306, E854–E868 (2014).
mental and clinical evidence pointed to a role of lipid metabolism in 32. Piomelli, D., Astarita, G. & Rapaka, R. A neuroscientist’s guide to lipidomics. Nat. Rev.
governing the ability of cells to survive in the brain microenvironment. Neurosci. 8, 743–754 (2007).
The importance of lipid metabolism in cancer has been highlighted by 33. Paget, S. The distribution of secondary growths in cancer of the breast. 1889. Cancer
Metastasis Rev. 8, 98–101 (1989).
a number of studies, but its role in brain metastasis has, to our knowl- 34. Dempster, J. M. et al. Agreement between two large pan-cancer CRISPR–Cas9 gene
edge, not been fully appreciated38–41. The possibility that interfering dependency data sets. Nat. Commun. 10, 5817 (2019).
with lipid or cholesterol metabolism might abrogate metastatic growth 35. Horton, J. D., Goldstein, J. L. & Brown, M. S. SREBPs: activators of the complete program
of cholesterol and fatty acid synthesis in the liver. J. Clin. Invest. 109, 1125–1131 (2002).
in the brain is particularly intriguing. More generally, this work illus- 36. Varešlija, D. et al. Transcriptome characterization of matched primary breast and brain
trates the complex interplay between cancer cell growth and the tissue metastatic tumors to detect novel actionable targets. J. Natl. Cancer Inst. 111, 388–398
microenvironment. (2019).
37. Angelova, M. et al. Evolution of metastases in space and time under immune selection.
Cell 175, 751–765 (2018).
38. Zhang, M. et al. Adipocyte-derived lipids mediate melanoma progression via FATP
Online content proteins. Cancer Discov. 8, 1006–1025 (2018).
39. Zou, Y. et al. Polyunsaturated fatty acids from astrocytes activate PPARγ signaling in
Any methods, additional references, Nature Research reporting sum- cancer cells to promote brain metastasis. Cancer Discov. 9, 1720–1735 (2019).
maries, source data, extended data, supplementary information, 40. Pascual, G. et al. Targeting metastasis-initiating cells through the fatty acid receptor
acknowledgements, peer review information; details of author con- CD36. Nature 541, 41–45 (2017).
41. Sullivan, M. R. et al. Quantification of microenvironmental metabolites in murine cancers
tributions and competing interests; and statements of data and code reveals determinants of tumor nutrient availability. eLife 8, e44235 (2019).
availability are available at https://doi.org/10.1038/s41586-020-2969-2.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
1. Barretina, J. et al. The Cancer Cell Line Encyclopedia enables predictive modelling of
anticancer drug sensitivity. Nature 483, 603–607 (2012). © The Author(s), under exclusive licence to Springer Nature Limited 2020
and brain metastasis RNA-seq was obtained from ref. 36. To exclude the
Acknowledgements We thank J. L. Goldstein for suggestions; A. Regev, N. Marjanovic and
confounding effect of brain stroma contamination in this dataset, a con- A. Bankapur for assistance with single-cell RNA-seq and analysis; Z. Herbert for assistance with
tamination indicator generated from GSE52604 was applied, and the RNA-seq; T. Mason for assistance with next-generation sequencing; S. Kim and S. Roberge for
contaminating effect was regressed out, generating a corrected gene assistance with animal work; B. Wong for suggestions on figure and portal designs; and G. Wei,
U. Ben-David, S. Corsello, P. Tsvetkov, I. Tirosh, R. Hosking and C. Mader for discussions. X.J.
matrix. PI3K-response signatures were from refs. 25,26. Signature analysis and G.B.F. were Susan G. Komen Fellows. A.A. is a HHMI Medical Research Fellow. This work
was conducted as described7. Hierarchical clustering and heatmaps was supported by BroadNext10, Broad SharkTank grants (X.J.), HHMI (T.R.G.), and in part by
were generated using gplots package. Other plots were generated using Koch Institute/DFHCC Bridge project grant (M.G.V.H. and R.K.J.). R.K.J. acknowledges support
from the NIH (R35CA197742, R01CA208205 and U01CA224173), National Foundation for
ggplot2. log-rank tests of survival curve difference were calculated Cancer Research; the Ludwig Center at Harvard; the Jane’s Trust Foundation; the Advanced
using survival package. A multivariate Cox proportional hazards model Medical Research Foundation and by the U.S Department of Defense Breast Cancer Research
Program Innovator Award W81XWH-10-1-0016.
was built using the coxph function (Extended Data Fig. 6h). Significance
of overlap was calculated using chisq.test or fisher.test function. Author contributions X.J. conceptualized the project, conducted experiments, collected
data and analysed results. Z.D. assisted with experiments. K.N. and T.N. assisted with
Reporting summary bioinformatic and RNA-seq analysis. A.A., A.D., C.B.C. and M.G.V.H. performed lipidomics
and data interpretation. G.B.F. and R.K.J. performed intracranial injection experiments and
Further information on research design is available in the Nature data analysis. L.P. and A.A.T. assisted with petal plot and portal development. C.Z., L.W., D.R.
Research Reporting Summary linked to this paper. and J.R. assisted with PRISM assay and data generation. V.M. and K.C. performed tissue
imaging, data acquisition and analysis. T.R.G. supervised the research. X.J. and T.R.G. wrote
the manuscript.
Data availability Competing interests T.R.G. receives research funding unrelated to this project from Bayer
MetMap data and interactive visualization can be accessed at https:// HealthCare, Novo Ventures and Calico Life Sciences; holds equity in FORMA Therapeutics;
is a consultant to GlaxoSmithKline; and is a founder of Sherlock Biosciences. M.G.V.H. is a
pubs.broadinstitute.org/metmap. RNA-seq data generated from scientific advisory board member for Agios Pharmaceuticals, Aeglea Biotherapeutics,
this study have been deposited in the Gene Expression Omnibus Auron Therapeutics and iTeos Therapeutics. R.K.J. received a honorarium from Amgen;
(GEO) under accession numbers GSE148283 and GSE148372. Addi- consultant fees from Chugai, Merck, Ophthotech, Pfizer, SPARC and SynDevRx; owns equity
in Accurius, Enlight, Ophthotech and SynDevRx; and serves on the Boards of Trustees of
tional datasets used in this study include METABRIC, TCGA and Tekla Healthcare Investors, Tekla Life Sciences Investors, Tekla Healthcare Opportunities
MSK-targeted-sequencing breast cancer datasets from cBioPortal, Fund and Tekla World Healthcare Fund. No reagents or funding from these organizations
were used in this study. X.J. and T.R.G. are named as inventors on pending PCT Patent
the EMC-MSK dataset (GSE2035, GSE2603, GSE5327 and GSE12276),
Application No. PCT/US20/29584 filed by The Broad Institute, which describes
the 65-metastasis-sample dataset (GSE14020), paired primary tumour compositions and methods for characterizing the metastatic potential of cancer cell lines.
and brain metastasis RNA-seq from ref. 36, and GSE52604. Source data The other authors declare no competing interests.
are provided with this paper.
Additional information
Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
2969-2.
Code availability Correspondence and requests for materials should be addressed to X.J. or T.R.G.
Peer review information Nature thanks Roger Gomis, Jason Locasale, Ultan McDermott and
Custom codes used for this study are accessible at the MetMap portal the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
(https://pubs.broadinstitute.org/metmap). Reprints and permissions information is available at http://www.nature.com/reprints.
Article
Extended Data Fig. 3 | Subcutaneous injection of PRISM cell line pool. injections. Detected lines are coloured in pink and non-detected lines are
a, The same PRISM pool of 498 cell lines used for MetMap500 profiling was coloured in light-blue. P value calculated using two-sided t-test. c, Scatter plot
tested using subcutaneous injection on a cohort of 6 mice. Survival curves showing barcode-quantified tumorigenic potential and metastatic potential
compare animal survival difference between subcutaneous and intracardiac from subcutaneous and IC experiments respectively. d, Group1 of basal breast
(IC) injections, P value calculated using two-sided, log-rank test. b, Total cancer pool (Extended Data Fig. 1a) was subjected to mammary fat pad
numbers of cell lines detected in animals from the subcutaneous and IC injection, barcode quantitation through RNA-Seq, and cell number inference.
Extended Data Fig. 4 | Association of overall metastatic potential with plots showing correlation of metastatic potential with patient age, stratified by
clinical parameters. a. Bar plots showing significance of single variate and cancer lineage. An inverse correlation was observed in several cancer types.
multi variate association analysis with metastatic potential in MetMap500. d–g, Correlation of overall metastatic potential with derived site (d), time
P values are calculated using linear regression and Anova (type II) of the linear length in culture to derive the cell lines (e), mutation burden (f) and cell
models. The dotted lines indicate 0.05 cutoff. b. Box plots showing metastatic doubling (g) in the 21 basal breast cancer cohort. d, P value calculated using
potential of cell lines stratified by metastasis status in the corresponding two-sided t-test. e–g, Pearson’s correlation coefficients and test P values are
patients and cancer lineage. Box plots display quartiles of the data. Outlier presented.
points extend beyond 1.5 × interquartile ranges from either hinge. c, Scatter
Article
Extended Data Fig. 5 | Genetic correlates of brain metastatic potential in adipogenesis peak at 8hr, 5. GO: carnitine metabolic process, 6. Reactome:
basal-like breast cancer. a. A line-by-line view of brain metastatic potential mitochondrial fatty acid beta oxidation, 7. GO: short chain fatty acid metabolic
and its associated features at genetic, expression, metabolite, and gene process. Data not available for the cell lines are marked with X. b, c, Scatter
dependency levels. Mutation: mutant (MUT), wild-type (WT). Copy number: plots showing the correlation of SREBF1 in vitro dependency and brain
data are binarized, with deletion (DEL) cutoff < = -1 and amplification (AMP) metastatic potential in MetMap500 (a) and MetMap125 (b). Strong inverse
cutoff > = 1. Expression signatures: 1. Hallmark: PI3K/AKT/MTOR signalling, correlation was observed for breast cancer in both datasets. Each dot
2. GO: ERBB signalling pathway, 3. GO: ERBB2 signalling pathway, 4. Burton: represents a cell line.
Extended Data Fig. 6 | See next page for caption.
Article
Extended Data Fig. 6 | Association of chr 8p gene copy number status and PI3K-response signatures in METABRIC and TCGA breast cancer datasets.
PI3K-response signatures with brain metastasis in clinical breast cancer PI3Ksig.1 was generated by overexpression of PIK3CAmut in breast epithelial
specimens. a, Heat maps showing coordinated expression of chr 8p genes cells. PI3Ksig.2 was generated by PI3K inhibitor treatment in the CMap
mirrored their copy number status in the two large breast cancer datasets, database. The right panel shows distribution of PI3Ksighigh cluster in different
METABRIC and TCGA. The 8plow cluster is defined by CNA data. The right panel breast cancer subtypes and its association with disease specific survival.
shows distribution of 8plow cluster in different breast cancer subtypes and its P values calculated using two-sided, log-rank tests. f, Hierarchical clustering of
association with disease specific survival. P values calculated using two-sided, primary breast tumours by PI3K signatures in the EMC-MSK dataset. The
log-rank tests. CNA, Copy Number Alteration. Exp, RNA-Seq Expression. PI3Ksighigh cluster is enriched in tumours that developed brain metastasis. The
b, Hierarchical clustering of primary breast tumours by 8p gene expression in right panel shows organ-specific metastasis free survival curves stratified by
the EMC-MSK dataset. The 8plow cluster is enriched in tumours that developed PI3K signatures. The PI3Ksighigh cluster displayed poorer brain metastasis.
brain metastasis, but not lung or bone metastasis. The right panel shows organ- Brain metastasis free survival curves stratified by PI3K signatures in different
specific metastasis free survival curves stratified by 8plow status. The 8plow subtypes is also presented. P values calculated using two-sided, log-rank tests.
cluster displays poorer brain metastasis compared to the 8pWT cluster. Brain g, Hierarchical clustering of breast cancer metastases by PI3K signatures, with
metastasis free survival curves stratified by 8plow status in different subtypes the PI3Ksighigh cluster being enriched in brain metastases. h, Heat maps
is also presented. P values calculated using two-sided, log-rank tests. showing significant yet non-complete overlap between 8plow and PI3Ksighigh
c, Hierarchical clustering of breast cancer metastases by 8p gene expression, clusters in the EMC-MSK dataset. 8plow and PI3Ksighigh clusters co-capture a
with the 8plow cluster being enriched in brain metastases. d, Chr 8p CNA status subset of patients with the worst brain metastasis prognosis. P values
determined by Targeted Seq in the MSK metastatic breast cancer dataset. Brain calculated using two-sided, log-rank tests. The lower panel presents a Cox
metastases are enriched in chr 8p deletion compared to primary tumour, local proportional-hazards model of brain metastasis free survival using multi
recurrence and metastases at other sites. The 8plow cluster predicts poor brain variates – 8p, PI3Ksig, and breast cancer subtype. The 8plow/PI3Ksighigh cluster
metastasis free survival. P values calculated using two-sided, log-rank tests. LN, is the most associated with brain metastasis. i. 8plow and PI3Ksighigh clusters co-
lymph node. e, Heat maps showing co-regulated patterns of two independent capture the majority of brain metastasis samples.
Extended Data Fig. 7 | Lipid metabolite profile changes upon SREBF1 SREBF1-KO of JIMT1 (PIK3CAmut) and HCC1806 (8plow) were used. Lipid species
knockout. Heat maps showing relative lipid abundance in cells cultured in groupings and lipid desaturation levels are also presented. WT, wild-type; KO,
medium supplemented with serum or delipidated serum. SREBF1-WT and knockout.
Article
Extended Data Fig. 8 | Analysis of multiplexed breast cancer metastasis dominated the population. e, Correlation of gene expression changes in
in vivo transcriptomes. a, A schematic of the differential analysis approach different metastasis sites. Pre-injected population had no expression change
for in vivo transcriptomes with mixtured cancer cell lines. An in silico and thus showed no correlation with in vivo samples. Brain metastases showed
transcriptome was modelled based on single cell line in vitro transcriptomes weaker correlations with extracranial metastases. f, Bubble plot showing
and cell line composition (comp.) of the metastasis sample. The in silico profile enrichment of Hallmark gene pathways (MSigDB) comparing in vivo expression
was then compared with the actual in vivo data in a paired-wise manner. of metastases at different organ sites to their in vitro counterparts. g, Bubble
b, Comparison of in silico modelled profiles to the actual pre-injected or in vivo plot showing in vivo upregulation of SREBF1, SCD and SREBF1-response
metastasis sample profiles. The pre-injected populations are direct mixtures signature in brain metastases. h, i, GSEA analysis of lipid metabolism gene sets
of in vitro cell lines and show tight correlation with in silico data. In vivo using in vivo RNA-Seq profiles combined by metastasis organ sites irrespective
samples show large fold changes. c, Box plots showing log 2 fold changes of of sample or cell line composition (h). Gene sets related to lipid metabolism are
MUCL1 and SCGB2A2 in in vivo metastasis samples and pre-injected cells. Each selectively enriched on top in the brain but not in other organs or in vitro.
point represents a sample. Box plots display quartiles of the data. Outlier Restricting analysis to JIMT1-dominant samples revealed a similar result. No
points extend beyond 1.5 × interquartile ranges from either hinges. d, Heat map enrichment was seen in normal brain when analysis was performed on GTEX
showing log 2 fold change of lung metastasis genes (Minn et al.) in lung, liver, normal tissue (i). Each tick represents a lipid metabolism gene set from
kidney and bone metastasis samples from the pilot study, where MDAMB231 MSigDB. ***, P = 0.001; ** = 0.01.
Extended Data Fig. 9 | See next page for caption.
Article
Extended Data Fig. 9 | Expression of TGFβ signalling, EMT status, Arrowheads indicate a few brain metastasis samples with noticeable brain
inflammatory response and lipid metabolism genes in clinical breast stroma contamination. A brain contamination score was calculated and its
cancer metastasis specimens. a, Comparison of brain metastasis versus effect was regressed out in the RNASeq data of matched primary tumours and
extracranial metastasis clinical samples. Lower expression of TGFβ signature brain metastases (c). The heat map shows expression of brain stroma indicator
genes and EMT signature genes in brain metastases than other metastasis sites. before and after removal of the contamination effect. d, e, Paired comparison
Enriched expression of selective SREBF1 target genes (including FASN, SCD, of primary breast tumour and brain metastasis clinical specimens after
SREBF1 itself) and Pentose Phosphate Pathway (PPP) genes in brain metastases. removal of brain stroma contamination. d, Lipid metabolism genes and PPP
b, c, A strategy to remove brain stroma contamination effect from brain genes. e, Signature scores were projected for each sample using the corrected
metastasis expression profiles when performing comparison of paired RNA-Seq profiles. P, Primary breast tumour; M, brain Metastasis; upregulation
primary breast tumour and brain metastasis clinical specimens. A gene in red, downregulation in blue. P values calcutated using paired, two-sided
signature indicating brain stroma contamination was derived from t-tests.
comparison of brain with breast and breast cancer brain metastasis (b).
Extended Data Fig. 10 | See next page for caption.
Article
Extended Data Fig. 10 | In vivo and in vitro effects of SREBF1 knockout. brain metastases were derived for CRISPR-seq (e), western blot (f), and RT–
a, Growth kinetics of SREBF1-WT and -KO cells in in vitro culture medium with qPCR (g) assays. e, CRISPR-seq quantifying SREBF1 gene editing efficiencies of
10% serum or 10% delipidated serum. Cell growth was monitored by Incucyte brain-derived and pre-injected cells. f, Western blot quantifying SREBF1
real-time imaging. WT, wild-type, in black; KO, knockout, in red. Two protein levels. g, RT–qPCR quantifying relative expression of SREBF1, SCD,
independent guides were used per group. b, Fluorescence imaging of CD36, FABP6 in brain-derived versus pre-injected cells. Pre-injected WT
metastases in serial brain sections from mice receiving intracardiac injection HCC1806 was used as reference. h, i, Brain-derived and pre-injected HCC1806
of JIMT1 SREBF1-WT or -KO cells (Fig. 5d). Confocal tile scans of representative cells were cultured in brain-slice-conditioned medium (CM) or medium
sections are presented at the lower panel. GFP+ signals indicate cancer lesions. supplemented with cerebrospinal fluid, or serum, or delipidated serum, or SM1
Circles highlight macro-metastatic lesions and arrows indicate micro lesions. supplement, and western blot (h) or RT–qPCR was performed (i). SREBF1, SCD
c, d, One-by-one assessment of lipid metabolism gene fitness in additional and CD36 were upregulated when cells were cultured in brain slice CM,
brain metastatic cell lines through intracranial injection. SREBF1 was tested for cerebrospinal fluid, and delipidated serum. Brain-derived SREBF1-KO cells
HCC1954, MDAMB231 (c) and HCC1806. Additional genes were tested for were better at inducing SCD and CD36, in comparison to pre-injected SREBF1-
HCC1806 (d). Cell outgrowth in brain metastasis was monitored by real-time KO cells. Experiments were performed twice independently with similar
BLI. Two independent guides per gene were tested, in a one guide one mouse results.
fashion. e–g, Outgrowing (HCC1806) or residual (JIMT1) SREBF1-KO cells from
nature research | reporting summary
Corresponding author(s): Todd R Golub; Xin Jin
Last updated by author(s): Jul 21, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Following softwares were used for data analysis: Living Image software (v4.5), Bowtie 2 (v2.2.8), samtools (v 1.3.1), BBSplit (https://
sourceforge.net/projects/bbmap/), RSEM (v1.3.1), R statistical software (v3.6.2), ggplot2 (3.3.0), limma (3.42.2), edgeR (3.28.0), gsva
(1.34.0), gplots (3.0.1.2), survival (3.1-8), fgsea (1.12.0), GSEA (v3.0), GenePattern (v2.0).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
October 2018
MetMap data and interactive visualization can be accessed at pubs.broadinstitute.org/metmap. RNA-seq data generated from this study have been deposited to
Gene Expression Omnibus (GEO), at accession numbers GSE148283 and GSE148372. Additional datasets used in this study include METABRIC, TCGA, and MSK-
targeted-sequencing breast cancer datasets downloadable from cBioPortal, EMC-MSK dataset (GSE2035, GSE2603, GSE5327, GSE12276), 65 metastasis sample
dataset (GSE14020), paired primary tumor and brain metastasis RNA-Seq from Vareslija et al, and GSE52604.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions Failed RNA-Seq samples were excluded from analysis presented in the manuscript. In MetMap500 experiment (Fig. 2), one animal died early
and organs could not be collected in time, and is excluded from analysis.
Replication Cell culture based experiments (including growth assay, RT-qPCR, western blot) were performed twice independently. Animal experiments
were validated using completely independent methods instead of direct repeat (Pooled experiment vs individual injection in Fig. 1a, Extended
Data Fig. 2g; MetMap500 vs MetMap125 in Fig. 2c; mini-pool CRISPR screen vs one-by-one testing in Fig. 5a-c).
Randomization Randomization was not applicable to experiments in this study. In MetMap profiling, we varied pooling format, cell density, cohort size,
animal age to account for these potential covariates.
Blinding Blinding to group allocations was not applicable to experiments in this study.
Antibodies
Antibodies used SREBF1 primary antibody (14088-1-AP, Proteintech)
SCD (CD.E10) antibody (ab19862, Abcam)
GAPDH (D16H11) XP® Rabbit mAb (5174S, Cell Signaling)
β-Actin (8H10D10) Mouse mAb (3700S, Cell Signaling)
IRDye® 800CW Goat anti-Mouse IgG (926-32210, LI-COR)
IRDye® 680RD Goat anti-Rabbit IgG secondary antibodies (926-68071, LI-COR).
Validation SREBF1 primary antibody (14088-1-AP, Proteintech): validated by manufacturer, and by this study (Extended Data Fig. 11f,h), and
cited in publications, suitable for western blot
SCD (CD.E10) antibody (ab19862, Abcam): validated by manufacturer, and by this study (Extended Data Fig. 11f,h), suitable for
western blot
GAPDH (D16H11) XP® Rabbit mAb (5174S, Cell Signaling): validated by manufacturer and cited in publications, suitable for
October 2018
western blot
β-Actin (8H10D10) Mouse mAb (3700S, Cell Signaling): validated by manufacturer and cited in publications, suitable for western
blot
2
Eukaryotic cell lines
Authentication Cell lines were authenticated by DNA fingerprinting analysis. The breast cell line identities were also confirmed by RNA-Seq
and compared to CCLE RNA-Seq profiles.
Mycoplasma contamination All cell lines were confirmed to be mycoplasma free using the MycoAlertTM Mycoplasma Detection Kit (Lonza).
Commonly misidentified lines PC-14 was identical to PC-9 as reported before (https://web.expasy.org/cellosaurus/CVCL_1640; https://
(See ICLAC register) www.sigmaaldrich.com/catalog/product/sigma/cb_90071810?lang=en®ion=US). To keep consistent with CCLE
nomenclature, PC14_LUNG was used. KPL-1 was found to be a MCF-7 derivative (https://web.expasy.org/cellosaurus/
CVCL_2094). To keep separate from MCF-7 and consistent with CCLE nomenclature, KPL1_BREAST was used.
Ethics oversight Animal work was performed in accordance with a protocol approved by the Broad Institute Institutional Animal Care and Use
Committee (IACUC).
Note that full information on the approval of the study protocol must also be provided in the manuscript.
Flow Cytometry
Plots
Confirm that:
The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.
Methodology
Sample preparation Organs were dissociated using dissociation protocols listed in Supplementary Table 9 with gentleMACS Octo Dissociator (Miltenyi
Biotec). Dissociated cell suspensions were filtered using 100 μm filters, and washed with DMEM/F12 twice. Cell suspensions
were then washed with staining buffer (PBS + 2mM EDTA + 0.5% BSA), and incubated with mouse cell depletion beads according
to the instructions (Miltenyi Biotec). Cell suspensions were subjected to negative selection using autoMACS Pro Separator
(Miltenyi Biotec) to deplete mouse stroma. Brains were subjected to an additional myelin debri depletion step using myelin
removal beads II (Miltenyi Biotec). In vitro cultured cells were trypsinized and resuspended as single cell suspensions. DAPI
staining was used to exclude dead cells.
Cell population abundance Data is presented in Extended Data Fig 1c and Source Data.
October 2018
Gating strategy Gating strategy is illustrated in Extended Data Fig. 1e to select for single cells with the fixed gate for GFP or mCherry.
Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information.
3
Article
https://doi.org/10.1038/s41586-020-2962-9 Hongbo Yang1,11, Yu Luan1,11, Tingting Liu1,11, Hyung Joo Lee2, Li Fang3, Yanli Wang4,
Xiaotao Wang1, Bo Zhang4, Qiushi Jin1, Khai Chung Ang5, Xiaoyun Xing2, Juan Wang1, Jie Xu1,
Received: 20 July 2019
Fan Song4, Iyyanki Sriranga1, Chachrit Khunsriraksakul4, Tarik Salameh4, Daofeng Li2,
Accepted: 17 September 2020 Mayank N. K. Choudhary2, Jacek Topczewski6,7, Kai Wang3, Glenn S. Gerhard8,
Ross C. Hardison9, Ting Wang2, Keith C. Cheng5 & Feng Yue1,10 ✉
Published online: 25 November 2020
The zebrafish has been an important vertebrate model system for Table 1). Because histone modifications have been used to predict
several decades because of its high fecundity, external embryogen- different classes of potential regulatory elements such as enhancers
esis, rapid embryonic development and nearly transparent embryos. and repressors10,11, we also performed chromatin immunoprecipitation
These features have made it an ideal system for the study of verte- followed by DNA sequencing (ChIP-seq) for a panel of histone modifica-
brate development and ageing2, comparative genomics3 and human tions, including H3K4me3, H3 lysine 27 acetylation (H3K27ac), H3 lysine
disease modelling. However, there is no comprehensive annotation 9 dimethylation (H3K9me2) and H3 lysine 9 trimethylation (H3K9me3).
of the cis-regulatory elements in the zebrafish genome. Although To study higher-order chromatin structure and link distal enhancers to
previous genomic studies in zebrafish have provided critical biologi- their target genes, we performed Hi-C experiments in adult brain and
cal insights4–8, most used whole embryos and our understanding of muscle (Fig. 1a, b). Although chromosome 4 is regarded as the ‘rudimen-
tissue-specific regulators remains limited. tary’ sex chromosome in zebrafish12, the quality of its current assembly
To profile the transcribed regions, chromatin accessibility is poor owing to the heavy presence of heterochromatin. Therefore,
and DNA methylation patterns in the zebrafish genome, we per- we performed three long-read DNA sequencing experiments (nano-
formed strand-specific RNA sequencing (RNA-seq), ATAC-seq9 and pore, 10X Genomics and Bionano optical mapping) in one Tübingen
whole-genome bisulfite sequencing (WGBS) in up to eleven zebrafish female zebrafish to generate a de novo assembly of chromosome 4. To
adult tissues and two embryonic tissues (Fig. 1a, b, Supplementary investigate the cell types and their regulatory elements in the zebrafish
1
Department of Biochemistry and Molecular Genetics, Feinberg School of Medicine Northwestern University, Chicago, IL, USA. 2Department of Genetics, The Edison Family Center for Genome
Sciences and Systems Biology, Washington University School of Medicine, St Louis, MO, USA. 3Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children’s Hospital of
Philadelphia, Philadelphia, PA, USA. 4Bioinformatics and Genomics Program, The Pennsylvania State University, State College, PA, USA. 5Department of Pathology and Penn State Zebrafish
Functional Genomics Core, College of Medicine, The Pennsylvania State University, Hershey, PA, USA. 6Department of Pediatrics, Northwestern University Feinberg School of Medicine,
Chicago, IL, USA. 7Stanley Manne Children’s Research Institute, Ann and Robert H. Lurie Children’s Hospital of Chicago, Chicago, IL, USA. 8Department of Medical Genetics and Molecular
Biochemistry, Lewis Katz School of Medicine at Temple University, Philadelphia, PA, USA. 9Department of Biochemistry and Molecular Biology, Pennsylvania State University, University
Park, PA, USA. 10Robert H. Lurie Comprehensive Cancer Center of Northwestern University, Chicago, IL, USA. 11These authors contributed equally: Hongbo Yang, Yu Luan, Tingting Liu.
✉e-mail: Yue@northwestern.edu
Hi-C
H3K27ac
H3K4me3
H3K9me3
H3K9me2 chr4:22.80 Mb-23.32 Mb
cd36 magi2
WGBS
Hi-C
n
e
nk
ey
e
in
on
is
r
n
en
rt
Muscle
ai
cl
ve
in
oo
ai
ea
st
Sk
dn
tru
br
us
le
ol
st
Br
20
Li
Te
Bl
H
E-
Sp
C
Ki
E-
M
te
c d Liver
In
3 0
Zebrafish 2 Brain-specific genes
log(TPM+1) myh6
1 20
0
0 0
5 3 20
Human Human orthologues
log(TPM+1)
MYH6 2 0
1
0 0 5
0
Te ain
e
Sk d
Ki lon
C e
C ne
Te ain
e
Sp ney
Sk d
H in
us t
Bl en
In Liv s
Ki lon
st r
Sp ney
H in
us t
Bl en
In Liv s
te er
M ear
te e
cl
M ar
in
cl
oo
i
oo
i
st
st
i
le
Br
le
st
o
Br
e
o
d
d
e f 5
A predicted novel transcript RNA-seq H3K4me3
Novel transcripts 0
Intensity
WGBS
E -brain H3K4me3
n = 8,311
H3K27ac 1
RNA seq 0
10
RNA-seq
E-trunk
0
20 10
0 0
chr1:660,000-671,000 –2 kb TSS +3 kb –2 kb TSS +3 kb
Fig. 1 | Identification of cis-regulatory elements in the zebrafish genome. mean ± s.e.m. TPM, transcripts per million base pairs. d, Box plot of the
a, Tissues and analyses performed in this study. H3K27ac, H3K4me3, H3K9me3 expression of brain-specific genes in zebrafish (top) (n = 2,481) and the
and H3K9me2 represent ChIP-seq analyses with the indicated antibody. expression of their orthologues in human (bottom) (n = 2,481). The y-axis
b, Snapshot of an example region, showing Hi-C, ChIP-seq, ATAC-seq, WGBS shows the gene expression value: log10(TPM+1). e, An example of a predicted
and RNA-seq data in adult zebrafish brain, muscle and liver, using WashU novel transcript. Vertical scale: 0–20 (H3K27ac and H3K4me3), 0–10 (RNA-seq).
Epigenome Browser. Plots show relative amount, normalized to the range of f, RNA-seq and H3K4me3 ChIP-seq signals for the predicted 8,311 novel
values. The values on the y-axis for ChIP-seq analyses were normalized to the transcripts across all the tissues. In all box plots, horizontal line shows the
input. Data range for the Hi-C heat map is 0–40 raw read counts. c, Expression median, the box encompasses the interquartile range, and whiskers extend to
of myh6 in zebrafish and the paralogue MYH6 in human are heart-specific 5th and 95th percentiles.
(n = 2). The values for human expression were from GTEx. Data are
brain, we performed single-cell ATAC-seq (scATAC-seq). In total, we in each tissue and merged them into a list of 436,036 non-redundant
generated 161 genomic datasets that comprised over 10 billion reads. peaks across all tissues (Fig. 2a, Supplementary Table 5). Of these
To our knowledge, this is the most comprehensive analysis of candidate peaks, 116,353 were previously unidentified in previous work in whole
cis-regulatory elements in zebrafish to date and represents a major embryos15–20 (Extended Data Fig. 2a and Supplementary Table 6). As
resource for comparative genomics and the study of gene regulation expected, distal ATAC-seq peaks show higher tissue specificity than
in this vertebrate model organism. proximal peaks (Fig. 2b, c).
We then defined the cis-regulatory elements with the following com-
binations of histone modifications and ATAC-seq peaks: active pro-
Transcriptome analysis moter (H3K27ac, H3K4me3 and ATAC-seq), weak promoter (H3K4me3
We detected 39,188 transcripts across all tissues using RNA-seq, 14,764 and ATAC-seq), active enhancer (distal H3K27ac and ATAC-seq) and
of which exhibited tissue-specific patterns (Extended Data Fig. 1a–c). heterochromatin (H3K9me2 or H3K9me3 sites). Across all the tissues,
We identified 13,285 previously unknown transcripts, 8,311 of which we predicted 25,593 active promoters, 40,220 weak promoters, 58,065
were also supported by H3K4me3 peaks at the promoter regions active enhancers and 112,445 heterochromatin sites (Extended Data
(Fig. 1e, f, Extended Data Fig. 1d, Supplementary Table 2). These 8,311 Fig. 2b, c and Supplementary Tables 7–10). A total of 40.9% of the pre-
transcripts include 976 long noncoding RNAs (lncRNAs), 3,596 previ- dicted promoters and 62.5% of the predicted enhancers reported in
ously unknown isoforms and 3,739 potential previously uncharacter- this study were not identified in previous reports6,21–25 (Extended Data
ized protein-coding genes. Fig. 2a). Of the enhancers, 71.3% were tissue-specific and also showed
Next, we examined whether the expression patterns for the ortho- tissue-specific ATAC-seq signals (Fig. 2d, e). Gene ontology (GO) analy-
logues of tissue-specific genes were conserved between zebrafish and sis showed that they were located near genes important for relevant
human. Among the 14,764 tissue-specific zebrafish transcripts, 3,737 tissue-specific functions (Extended Data Fig. 2d).
have a one-to-one human orthologue, 1,747 of which (47%) also show To validate the predicted enhancers and their tissue specificities,
tissue-specific patterns in human (Fig. 1c, d, Supplementary Table 3), we used a GFP-based zebrafish embryo reporter assay. Of the 32
suggesting that these genes might have a critical and conserved role tissue-specific enhancers tested, 87.5% (28 out of 32) showed restricted
in the tissues in which they are uniquely expressed. GFP expression (Fig. 2f, Extended Data Fig. 3, Supplementary Table 11).
Number of proximal
Distal
Number of distal
Exon 3
2
Number of
Intron
Promoter 1
0 0 0
Br es
on
on
on
H le
H e
H le
L d
Li d
C ne
C ne
C ne
Sp ney
Sp ey
Sp ey
S s
is
is
d t
Ki art
Ki art
Bl n
Bl n
Te ain
Bl en
Te in
Te in
M kin
M in
M ki n
st r
te r
te r
In Liv d
Ki ear
In ve
te e
In ive
cl
i
oo
oo
e
e
c
c
oo
st
st
st
a
Sk
pl
ol
ol
ol
dn
dn
i
i
le
le
le
e
us
us
us
st
st
Br
Br
S
am
ls
dAl Clustering of enhancers e ATAC-seq signal f
Brain enhancer 5 5 dpf Heart enhancer 5 3 dpf
Intensity
Intensity
3 2
1 elavl3 gata6
0 chr3:48,934,231-48,935,968 chr2:4,311,945-4,313,944
–2 0
musk zgc153722
chr10:13,126,525-13,128,100 chr23:16,855,310-16,856,895
n n s n d r n e t e n y
ai ai sti ee oo ive lo tin ar cl ki ne
Li d
Te ain
S le
te on
H ne
S p stis
us t
In ol r
B l en
ey
K i kin
M ear
Br -br Te pl Bl L Co tes He us SKid
C e
oo
c
v
dn
le
Br
st
E S In M
g h i j MA0077.1_SOX9
20 0.5
PCC = 0.878 Bulk vs scATAC-seq
40 18 16 8
Motif enrichment
0
Bulk ATAC-seq
24 9
20 1 15
25 21
6
t-SNE2
Fig. 2 | Characterization of tissue-specific cis-regulatory elements. the 63 surviving embryos showed similar patterns. For kidney enhancer 1, 47
a, Number of ATAC-seq peaks predicted in each tissue and their genomic out of the 82 surviving embryos showed similar patterns. Scale bar, 200 μm.
distribution. b, c, Tissue specificity of proximal and distal ATAC-seq peaks in dpf, days post-fertilization. g, Pearson correlation coefficient between
11 adult tissues. d, Clustering analysis identified tissue-specific enhancers. aggregated signals of scATAC-seq and bulk ATAC-seq data. Values are the sums
Values in the heat map were input-normalized H3K27ac intensity (n = 58,226 of the reads in continuous 10-kb bins, normalized by sequencing depth. h, The
enhancers). e, Normalized ATAC-seq intensity in the corresponding enhancer overlap between peaks predicted in bulk and scATAC-seq data. i, t-distributed
elements shown in d. f, Examples of validated tissue-specific enhancers by GFP stochastic neighbour embedding (t-SNE) analysis identified 25 clusters in the
reporter assay in zebrafish embryos. For brain enhancer 5, 112 out of 143 scATAC-seq data in zebrafish adult brain (n = 19,955). j, Examples of enriched
surviving embryos showed similar patterns. For heart enhancer 7, 61 out of the motifs in different clusters from scATAC-seq peaks (n = 19,955).
67 surviving embryos showed similar patterns. For muscle enhancer 5, 53 out of
98.7% of the bulk ATAC-seq peaks in the brain (Fig. 2g, h, Extended Short interspersed nuclear elements (SINEs) were enriched in both
Data Fig. 4a–c). Among them, 73,264 peaks were detected only by H3K9me2 and H3K9m3 sites, whereas long terminal repeats (LTRs) were
scATAC-seq, suggesting that there are potentially more regulatory enriched only in H3K9me2 sites (Fig. 3b). Although both H3K9me2 and
elements in the zebrafish genome than we predicted on the basis of H3K9me3 sites were depleted of active marks within the same tissue,
the bulk tissue results. 20% of these sites overlapped with ATAC-seq peaks or other active
Using the scATAC-seq data, we identified 25 clusters of cells in the marks in other tissues (Fig. 3a, Extended Data Fig. 5e, g), suggesting
zebrafish brain (Fig. 2i). By identifying the key cell-type-specific tran- that heterochromatin regions in one tissue may be active regulatory
scription factor motifs, we inferred the potential cell type of each cluster, elements in other tissues.
such as oligodendrocyte progenitor cells and prefrontal cortex cells To study DNA methylation patterns in zebrafish, we performed
(Extended Data Fig. 4d, e). We quantitatively determined the enrich- WGBS in 11 adult tissues with approximately 30× coverage in each
ment of transcription factor motifs in each of the 25 clusters (Fig. 2j, dataset. Genome-wide CpG methylation levels were approximately
Extended Data Fig. 4f). Many neuronal transcription factors (such 80% across different tissues, with the exception of the testis, which
as SOX9 and OLIG2) were enriched in different clusters, suggesting exhibited higher CpG methylation levels (Extended Data Fig. 6a, b). We
potential roles in cell-type-specific regulation in the zebrafish brain. also detected increased levels of methylation at the CAC trinucleotide
in brain compared with other tissues (Extended Data Fig. 6c), similar
to reports in human and mouse26. Unmethylated CpGs were found
Heterochromatin and DNA methylation mostly in CpG islands, gene promoters and 5′ untranslated regions
We performed ChIP-seq for H3K9me2 and H3K9me3 in 11 adult (Extended Data Fig. 6d, e), whereas CpGs in gene bodies and differ-
zebrafish tissues (Extended Data Fig. 5a). Across all tissues, we identi- ent classes of repetitive elements were heavily methylated (Extended
fied 73,777 non-redundant H3K9me2 sites and 68,798 non-redundant Data Fig. 6e). We also identified unmethylated regions (UMRs) and
H3K9me3 sites. While both H3K9me2 and H3K9me3 are heterochro- low-level-methylated regions (LMRs) (Supplementary Table 12). Most
matic marks, they were located in different parts of the genome, with UMRs overlapped with candidate promoters and proximal ATAC-seq
overlap of about 10% in the same tissue (Extended Data Figs. 5b–d). peaks, whereas LMRs overlapped more with candidate enhancers and
Log2(FC)
Blood Blood
Colon Colon By combining these long-read DNA-sequencing results with the Hi-C
Heart Heart
Intestine Intestine 2 data from brain, we de novo assembled a new version of chromosome
Kidney Kidney 1
Liver Liver
0
4 (Methods, Extended Data Fig. 7c, Supplementary Dataset 1). With the
Muscle Muscle
Skin
Spleen
Skin
Spleen
–1 newly assembled genome, we reprocessed the Hi-C data and observed
Testis Testis that most of the aberrant signals were no longer visible on the Hi-C map
Unknown
Unknown
0 Percentage100 0 Percentage 100
Satellite
Unknown
RC
RC
DNA
DNA
SINE
Satellite
SINE
LINE
LINE
LTR
LTR
Satellite
RC
DNA
LINE
LTR
SINE
(Fig. 3e, f). We reprocessed the BioNano optical-mapping data and also
Heterochromatin Active marks in Active marks in
same tissue other tissues observed fewer structural-variation events (Extended Data Fig. 7d, e).
c d
DNA
Examples of tissue-specific hypoDMRs
This newly assembled chromosome 4 will serve as a resource to study
methylation H3K27ac ATAC-seq
sex determination and other processes in zebrafish that involve genes
Brain dazl elavl4
H3K4me3 Testis on this chromosome.
H3K27ac
ATAC-seq
WGBS Conservation of cis-regulatory elements
RNA
Functional elements are often conserved during evolution27. We
–5 kb +5 kb
H e
chr19:20,815,000-20,820,000 chr8:15,964,500-15,984,000
first examined the sequence conservation of different classes of
rt
r
cl
ve
ea
Intensity
us
Li
0 1 0 5 0 5
M
Percentage (%)
Fish phyloP
ated high-resolution Hi-C data (10 kb) in the adult brain and muscle
(Extended Data Fig. 10a), with about 2.1 and 1.4 billion paired-end
reads, respectively. Replicates of the Hi-C experiments were highly
0 0
r r
on ote ce om
n t e y
ai od lon ar in ne
e
cl kin en
0.05
–10 kb Enhancer +10 kb reproducible32. We predicted the A/B compartments and found that
r
Br Blo Co He testKid us S ple
ve
Ex om han and
Li
Pr En R In M S their genomic coverages in these two tissues were similar. H3K27ac,
d e Human
H3K4me3 and ATAC-seq signals were enriched in the A compartment,
UCNE: SALL3_Anna Zebrafish
20
RFX
OLIG2
whereas H3K9me2 and H3K9me3 signals were enriched in the B com-
Zebrafish
H3K27ac
NEUROD partment (Extended Data Fig. 10b). We identified 5,348 regions with
Brain
20 H3K4me3 ATOH1
chr19:22,457,896-22,461,885 TLX
ETV
switched A/B compartments between the two tissues, and these regions
PU.1
were associated with altered gene expression and H3K27ac signals
Mouse
EHF
Brain
GATA2
chr18:81,347,542-81,351,549 GATA6 (Extended Data Fig. 10c). We predicted 1,350 topologically associating
ETS1
FOXA2 domains (TADs) in the brain and 1,238 TADs in the muscle (Extended
Human
Brain
BMAL1
chr18:76,480,477-76,485,273
HNF4A Data Fig. 10d, Supplementary Table 15). Most of the TADs were shared
HNF1
CDX2 between the two tissues (Extended Data Fig. 10b, e) and TAD boundaries
phyloP
4
ERRA
-4 GATA4
GATA3
were enriched for CTCF-binding sites, SINEs and satellite elements
P-value
NUR77
MEF2D (Extended Data Fig. 10f–h).
MYOD
SIX1 20 We identified 7,708 and 5,312 chromatin loops in the adult brain and
P53
ETV1 0 muscle, respectively (Fig. 5a, Supplementary Table 16). The major-
Blood
Colon
Muscle
Blood
Muscle
Intestine
Kidney
Kidney
Heart
Heart
Liver
Brain
Brain
Skin
Spleen
Spleen
Liver
Skin
Mouse reporter
assay by VISTA
muscle and 63% in brain). Of the predicted loops in the brain, 98.6%
f g
5 12
Enhancer-to-gene pairs
were between regions that contain either at least one promoter or one
Mean = 4.85 Mean = 2.67
95% Cl 9
To 1 gene = 34%
100
Expected enhancer, and 91.6% of enhancer–promoter loops overlapped with
Frequency (×103)
6
linkage pairs (Fig. 5b). We performed motif analysis to identify the
3 transcription factors that may have a role in forming the chromatin
0
loops. CTCF and BORIS were enriched in shared loops (Fig. 5a), and
0 0
0 10 20 30 40 0 10 20 30 40 0 500 tissue-specific transcription factors were enriched in tissue-specific
Number of enhancers Number of genes Genomic distance (kb)
linked per gene linked per enhancer chromatin interactions (Fig. 5a). For example, RFX and NeuroD2 were
enriched in brain-specific loops, whereas two muscle-specific master
Fig. 4 | Conservation of zebrafish cis-regulatory elements and
regulators, Myf5 and Ascl1, were enriched in muscle-specific loops
transcriptional networks. a, Percentage of zebrafish exons and cis-regulatory
elements that have orthologous sequences in human. Total number for each
(Fig. 5c).
bar: exon, 1,000; promoter, 25,593; enhancer, 58,065; and random, 1,000. For
exons and random, we randomly sampled 1,000 elements and computed the
percentage conservation. The simulations were performed 20 times and the Zebrafish genome evolution and TADs
mean percentage is shown. b, Percentage of human orthologous sequences of TADs have been shown to be conserved among different species33–36.
zebrafish enhancers that were predicted as enhancers in human tissues. Total To investigate the relationship between TADs and zebrafish genome
number for each bar: brain, 1,241; blood, 748; colon, 775; heart, 839; intestine, evolution, we first identified three sets of zebrafish evolutionary break-
564; kidney, 173; liver, 402; muscle, 591; skin, 356; and spleen, 1,000. c, Fish points by aligning its genome against chicken, mouse and human,
PhyloP score for the zebrafish enhancers with sequences that were not respectively. We then compared the breakpoints with TAD annota-
conserved in human (number of enhancers in red line is 51,446, blue line is tions in zebrafish and observed that 80.5% of breakpoints (984 of 1,223
50,000). d, An ultra-conserved noncoding element predicted as a brain
zebrafish-to-human breakpoints) were located near TAD boundaries,
enhancer in zebrafish, mouse and human. This enhancer element has been
but depleted towards the centre of TADs (Fig. 5d, Extended Data
validated by transgenic reporter assay in mouse (hs1056 in the VISTA Enhancer
Fig. 11a). We divided TADs into two groups: TADs containing a break-
Browser). e, Heat map showing transcription factor motif enrichment in
tissue-specific enhancers in zebrafish and human. f, Linking distal enhancers to
point and TADs not containing a breakpoint. TADs without breakpoints
their target genes by correlation of tissue-specific activity. Left, distribution of had stronger interaction frequencies in the middle than TADs with
the predicted number of enhancers per gene. Right, distribution of predicted breakpoints (Fig. 5e, Extended Data Fig. 12d). Further, the expression
number of genes per enhancer. g, Validation of the predicted enhancer-to-gene patterns of genes across different tissues in the TADs without break-
pairs by Hi-C interaction counts in brain. points were more correlated with those of their homologues in human
(Fig. 5f) than the other group, suggesting that there is an association
between TAD stability and conservation of the expression pattern. This
tissues in zebrafish and human (Fig. 4e). To further probe the simi- may be caused by strong chromatin interactions that may contribute to
larities in transcription factor connections between zebrafish and TAD stability during evolution, or breaking of TADs with strong interac-
human, we performed the three-node network analysis as previously tions that are selected against in evolution, as these interactions and
described31 (Methods). We observed that CTCF was predicted as the genes involved in the interactions may be physiologically important
the driver node in most tissues, whereas tissue-specific transcrip- for zebrafish.
tion factors such as NEUROD and MYOD were predicted as middle Next, we divided zebrafish TAD boundaries into two classes, bounda-
and passenger nodes in the networks of brain and muscle tissue, ries overlapping with breakpoints and boundaries not overlapping with
respectively (Extended Data Fig. 9e, f). The overall patterns of the breakpoints. We observed higher H3K4me3 signals at TAD boundaries,
three-node networks were highly similar between zebrafish and as previously described36. We found a much higher level of H3K4me3 at
human (Extended Data Fig. 9f), further demonstrating the value of TAD boundaries with breakpoints (Fig. 5g, Extended Data Fig. 11b, c).
zebrafish as a system to study human transcription factor regula- We also confirmed similar higher H3K4me3 enrichment in human
tory circuits. or mouse TAD boundaries with evolutionary breakpoints (Extended
specific
39.73%
Brain
RFX2 106 32.53% None
34.37% E–P
NeuroD2 105 P–P
34.37%
Shared
BORIS 437 Zebrafish vs mouse Zebrafish vs human
200 200
breakpoints (BPs)
49
X-box
Number of
n = 2,219
specific CTCF 79
Muscle
Ascl1 29
Myf5 24 50 50
–500 kb +500 kb –500 kb +500 kb
TAD TAD
c
Shared loops Brain-specific loops Muscle-specific loop
akt3a zbtb18 traip
20 H3K27ac
Muscle Brain
5 RNA-seq
20 H3K27ac
5 RNA-seq
Brain Brain Brain
= 35 = 20 = 15
Muscle Muscle Muscle
chr13:13,000,000-17,500,000 chr13:10,000,000-11,800,000 chr11:33,750,000-35,600,000
e f h 3.0
Pol2
With CTCF
TADs without BPs With Without
ChIP-seq
BPs CTCF
signal
**
Without
BPs
0 1
Correlation 0.5
–1 Mb Center +1 Mb
GRO-seq
TADs with BPs
g 0.14
8 With BPs
Without BPs
GRO-seq
H3K4me3
signal
+1 Mb 0.02
–1 Mb Center 0
–500 kb +500 kb –200 kb Breakpoint +200 kb
0 1 Boundary
Fig. 5 | Higher-order chromatin structure and zebrafish genome evolution. breakpoints (bottom). f, Expression pattern of genes in TADs without an
a, Aggregate peak analysis plot and motif analysis of tissue-specific or shared evolutionary breakpoint is more highly conserved than genes in TADs that
chromatin loops. In each panel, n is the number of loops in that group. b, Left, contain a breakpoint. For each gene, we collected its expression profile across
annotation of cis elements in the predicted loop anchors in brain with a total of the same ten tissues in both zebrafish and human, and computed a Spearman
7,710 loops in the pie chart. Right, comparison of promoter–enhancer correlation coefficient between the profiles for each gene. Number of gene
chromatin loops with correlation-based linkage between ATAC-seq or histone pairs: without BPs, 4,625; with BPs, 3,918; P = 3.56 × 10 −26, two-sided Mann–
modification-based enhancer–gene pairs with a total of 4,996 loops in the pie Whitney U test. g, H3K4me3 signals were higher in TAD boundaries with
chart. c, Examples of shared, brain-specific and muscle-specific chromatin breakpoints than TAD boundaries without breakpoints. h, Higher
loops. d, Relative position of evolutionary breakpoints to TADs. Breakpoints transcriptional activities at TAD boundaries with breakpoints and containing
were between zebrafish and mouse (left) or between zebrafish and human CTCF binding sites in human GM12878 cells. Number of breakpoints: with CTCF,
(right). In all cases, we found that the evolutionary breakpoints were enriched 639; without CTCF, 625. K562 cell data is shown in Extended Data Fig. 12c. Results
at zebrafish TAD boundaries and depleted from the centre of TADs. e, TADs from 17 additional vertebrates are shown in Extended Data Figs. 11a, b, 12d.
without breakpoints (top) have stronger internal interactions than TADs with
Data Fig. 11d–f). As a control, we observe similar amounts of H3K27ac We observed that there were much higher transcription activities at
or ATAC-seq enrichment between the two groups of TAD boundaries breakpoints overlapping with CTCF-containing TAD boundaries, com-
(Extended Data Fig. 12a, b). Notably, an earlier report showed H3K4me3 pared with those without CTCF TAD boundaries (Fig. 5h, Extended
signal enrichment at recombination hotspots in mouse37, and our find- Data Fig. 12c).
ing in zebrafish further suggests its potential association with genome A key feature of the zebrafish genome is an extra genome-duplication
stability and evolution. event compared with other vertebrates41. There were 2,456 paralogous
Previous work has suggested a link between transcription at gene pairs annotated in Ensemble and the paralogues show similar
CTCF-containing TAD boundaries and their potential role in trans- expression patterns across all different tissues, with a median Pearson
locations38–40. Therefore, we investigated the transcriptional status correlation of 0.458 (Extended Data Fig. 12e). We analysed paralogue
at the evolutionary breakpoints at TAD boundaries using the Pol2, pairs located on the same chromosome and observed that paralogues
CTCF ChIP-seq and GRO-seq data in GM12878 and K562 human cells. located in the same TADs have a higher correlation in gene expression
Extended Data Fig. 1 | Tissue-specific gene expression in zebrafish. Testis = 1,605). d, Distribution of H3K4me3 signals surrounding the known and
a, Clustering analysis of transcripts from RNA-seq data in embryonic and adult predicted novel transcripts. e, Human orthologues of zebrafish tissue-specific
tissues (n = 31,842). b, c, Gene Ontology and KEGG pathway analysis for the genes were more tissue-specific compared to human orthologues of
tissue-specific genes in adult brain, heart and testis (the number of non-tissue-specific zebrafish genes (n = 14,764, 3,739, 6,043, Mann–Whitney U
tissue-specific genes in these two figures are, Brain = 3,693, Heart = 392, Test, two-sided, ***P < 2.2 × 10 −16).
Extended Data Fig. 2 | Comparative analysis of zebrafish cis-regulatory promoters have higher expression level. Blue hollow bar indicates the known
elements. a, Comparison of the predicted regulatory elements identified with mrpl39 promoter. Orange hollow bar indicates the potential novel promoter.
previous data. Enhancers were based on H3K27ac signals in the same four The mrpl39 promoter has H3K4me3 peaks in both muscle and brain, but only
tissues (brain, heart, intestine, testis) from Perez-Rico et al. 2017. The data we has strong H3K27ac signals in muscle and its expression is higher (4.43-fold).
generated are from Tübingen zebrafish strain and the published results were d, Gene Ontology results for the muscle-specific enhancers and skin-specific
from the AB strain. b, Number of predicted cis-regulatory elements in each enhancers. We used the GREAT tool for this analysis (the numbers of
tissue. E-brain stands for 1 dpf embryonic neuron cells. E-trunk stands for 1 dpf tissue-specific enhancers used in this figure are muscle = 813, skin = 512).
zebrafish whole trunk region. c, An example showing genes with active
Article
Extended Data Fig. 3 | Enhancer reporter assay for tissue-specific embryos, respectively, had green signals in the heart region. For the six tested
enhancers. In total, 28 of 32 predicted tissue-specific enhancers showed muscle enhancers, 52/57, 26/30, 107/124, 53/63, 93/114, 61/67 and 66/78
consistent GFP signals in the corresponding tissues. For the eight brain embryos, respectively, had green signals in the trunk muscle. For the four
enhancers tested, 63/95, 51/86, 85/119, 112/143, 27/45, 34/48, 27/41, 62/77, and selected kidney enhancers, 47/82, 35/67, 44/62, 15/42 and 56/110 embryos,
37/45 embryos, respectively, had green signals in the brain region. For the six respectively, had green signals in the kidney region.
tested heart enhancers, 64/94, 52/85, 79/121, 20/41, 51/95, 32/55 and 20/31
Extended Data Fig. 4 | Single-cell ATAC-seq in zebrafish brain. a, Barcode distribution in the tSNE projection. Bottom left, pileups of differentially
selection of single cell ATAC-seq. The x-axis represents the log value of the accessible ATAC-seq signals for each cluster. Shown in the figure is the +/− 10kb
number of unique molecular identifiers (UMI); the y axis represents the ratio of flanking region surrounding peak centres. Bottom right, most significantly
fragments in promoter regions; the red lines represent threshold, and the grey enriched transcription factor motif for each cluster. e, t-SNE projection of all
shadows represent that the barcode passed the filter. b, Genomic distribution scATAC-seq cells colored by Z-score of peak enrichment. f, Motif enrichment of
of all differentially accessible (DA) peaks. c, Overlap of all differentially known neuron-specific TFs in scATAC-seq predicted clusters (n = 19,955).
accessible peaks with enhancers predicted in bulk brain. d, Top, the cluster
Article
Extended Data Fig. 5 | Heterochromatin annotation in adult tissues. depleted of ATAC-seq, H3K4me3 and H3K27ac ChIP-seq signals (n = 68,789
a, WashU Epigenome Browser screenshot of H3K9me3 and H3K9me2 histone H3K9me3 sites and n = 73,777 H3K9me2 sites). f, Overlap of H3K9me3 sites,
ChIP-seq signals in 11 zebrafish adult tissues. The values on the y-axis were H3K9me2 sites, and ATAC-seq peaks with repetitive elements (The total
input-normalized. b, Distribution of H3K9me3 and H3K9me2 sites in the number of each bar, from left to right, 68,789, 73,777 and 436,036). g, Examples
zebrafish genome. c, Venn diagram shows the overlap between H3K9me3 and of H3K9me3 sites in one tissue found to be active regions in other tissues.
H3K9me2 sites in zebrafish genome. d, Overlapping percentile of H3K9me3 Horizontal scale 0-20 for H3K27ac and H3K4me3, 0-10 for RNA-seq, 0-5 for
and H3K9me2 peaks in adult tissues. e, H3K9me3 and H3K9me2 sites were H3K9me3 and H3K9me2.
Extended Data Fig. 6 | DNA methylation level and distribution in adult genomic features or repetitive element classes. CDS, coding sequence.
tissues. a, Fraction of total CpGs with low (<25%), medium (≥25% and <75%), and f, Number of UMRs and LMRs in zebrafish tissues and their overlap with
high (≥75%) methylation levels and mean CpG methylation levels (mCG/CG) in enhancer and promoters (left panel) (number of UMR and LMR, from top to
zebrafish adult tissues (the mCG/CG ratio, from left to right, 0.788, 0.859, bottom, 14,990, 10,569, 14,569, 14,587, 14,831, 14,289, 13,842, 13,569, 14,424,
0.790, 0.777, 0.791, 0.797, 0.781, 0.777, 0.804, 0.789, 0.781). b, Distribution of 14,374, 13,908, 30,009, 7,916, 19,038, 21,411, 22,591, 16,796, 14,961, 16,268,
CpG methylation levels across zebrafish adult tissues. c, The distribution of 17,481, 15,932, 15,665) and ATAC-seq peaks (right panel)(numbers of UMR and
non CpG methylation in 11 adult tissues. d, Mean methylation levels of the LMR are the same with left panel). g, Clustering of tissue-specific hypoDMRs.
tissue-specific gene promoters. n represents the number of tissue-specific Values in the heat map are mean methylation levels of hypoDMRs (n = 17,654,
gene promoter. e, Mean methylation level of CpGs overlapping different number of tissue-specific hypoDMRs).
Article
Extended Data Fig. 7 | De novo assembly of zebrafish chromosome 4 of the c, Overall strategy of de novo assembly of the Tübingen chr4 by integrating
Tübingen strain. a, WashU Epigenome Browser snapshot showing that 10X, Nanopore, Bionano, and Hi-C data. d, Bionano long molecule sequencing
heterochromatic marks H3K9me2 and H3K9me3 signals were enriched on data shows that there were many SVs on chr4 when mapped to the GRCz11
chromosome 4 in zebrafish testis. The values on the y-axis were input-normalized. reference genome. e, SVs on chr4 detected by Bionano when the data were
b, H3K9me2, H3K9me3, and DNA methylation level on chr4 long arm are mapped to the de novo assembled chr4.
significantly higher than other regions in all tissues (n = 11, two-sided, t-test).
Extended Data Fig. 8 | Conservation of cis-regulatory elements from computed their conservation percentage. The simulations were performed
zebrafish to other vertebrates. a, Percentage of zebrafish enhancers whose 20 times and the average percentage was presented. d, Another example of
sequences were conserved in human (the number of each bar, from left to right, ultra-conserved noncoding element (UCNE). This element (FOXP1_Finn_1) is
13,307, 7,018, 11,940, 7,499, 14,783, 14,272, 8,995, 13,777, 10,757, 15,505, 1,734, predicted to be a muscle enhancer in zebrafish, mouse, and human. Grey
4,011, 5,247). b, c, Similar to Fig. 4a. Percentage of zebrafish exons and vertical bar marks the ultra-conserved region. Red vertical bar is the enhancer
cis-regulatory elements that have orthologous sequences in mouse and other sequence in the human genome that was validated as a limb enhancer by
fish species. Total number of each bar, from left to right: 1,000, 25,593, 58,065, transgenic mouse reporter assay in the VISTA Enhancer Browser (#hs956).
1,000. For exons and random, we randomly sample 1000 elements and
Article
Extended Data Fig. 9 | Distal ATAC-seq peak-to-gene pairs, enhancer- different downstream targets by motif prediction analysis. f, The overall
to-gene pairs, and transcriptional regulation network. a, b, Distance structure of the regulatory network is conserved between human and
distribution of cis-regulatory elements to their linked gene TSS. c, Correlation zebrafish. FFL connection analysis was performed, in this analysis, there are
of ATAC-seq peak-to-gene pairs and Enhancer-to-gene pairs (n from left to three types of nodes: A, driver node that regulates B and C; B, middle node,
right = 3,292, 3,827, 3,544, 3,281, 3,008, 2,795, 2,357, 2,001, 1,106). d, Validation regulated by A but regulating node C; C, passenger node, regulated by both A
of predicted enhancer-to-gene pairs by Hi-C interaction counts in muscle. and B.
e, mef2d is a regulator in both zebrafish muscle and heart, but it regulates
Extended Data Fig. 10 | See next page for caption.
Article
Extended Data Fig. 10 | Compartment and TADs in zebrafish. a, Heat map of RNA-seq signals. d, Examples of shared TADs between zebrafish brain and
genome-wide Hi-C interaction matrices in zebrafish brain (blue) and muscle muscle. e, Average DI scores surrounding TAD boundaries identified in brain
(red). b, Active marks (H3K4me3, H3K27ac, and ATAC-seq) were enriched in (upper panel) and muscle (lower panel). f, ChIP-seq data shows that CTCF
compartment A and depleted in compartment B. Repressive marks (H3K9me2 binding sites were enriched at TAD boundaries. g, Footprint analysis of ATAC-
and H3K9me3) were enriched in compartment B. Error bands represent seq peaks in the TAD boundaries shows enrichment of CTCF binding motif
standard error of the mean. c, Genome browser snapshot of A/B compartment (number of each bar, from left to right, 0.213, 0.24, 0.22, 0.237, 0.251, 0.232,
in brain and muscle. The blue vertical shaded area marks a region that is located 0.24, 0.262, 0.271, 0.281, 0.37, 0.27, 0.253, 0.25, 0.252, 0.253, 0.26, 0.23, 0.238,
in compartment B in brain but in compartment A in muscle. As expected, A 0.24, 0.22). h, Repetitive elements enriched at TAD boundaries (left panel) and
compartment which is associated with more ATAC-seq peaks, H3K27ac and loop anchors (right panel).
Extended Data Fig. 11 | Comparing zebrafish evolutionary breakpoints Orange vertical bar labels the TAD boundaries. c, Higher H3K4me3 levels at
with TAD annotation. a. Similar to Fig. 5d. Enrichment of evolutionary breakpoint-containing TAD boundaries when using TADs annotation from
breakpoints at TAD boundaries. Relative positions of evolutionary breakpoints zebrafish muscle were found as well, similar to Fig. 5g. d, H3K4me3 enrichment
to TADs in 15 vertebrates. In all cases, we found that the evolutionary in human ESCs (H1) TAD boundaries with or without zebrafish-to-human
breakpoints were enriched at zebrafish TAD boundaries and depleted from the breakpoints. e, H3K4me3 enrichment in mouse ESCs TAD boundaries with or
centre of TADs. Grey vertical bar labels the TAD body area. b, By comparing without zebrafish-to-mouse breakpoints. f, H3K4me3 enrichment in human
zebrafish with 17 vertebrates, H3K4me3 signals were found to be more ESCs (H1) TAD boundaries with or without mouse-to-human breakpoints.
enriched at TAD boundaries with breakpoints than those without breakpoints.
Article
Extended Data Fig. 12 | TADs with and without breakpoints. a, H3K27ac and stronger interaction frequencies in the middle than TADs with evolutionary
ATAC-seq signals do not show differences at TAD boundaries with breakpoints breakpoints (upper panel). Breakpoints in these 17 vertebrates were defined by
compared to those without breakpoints. Orange vertical bar labels the TAD comparing their genomes to the zebrafish genome. e, Distribution of
boundaries. b, Sizes of TADs with and without evolutionary breakpoints were correlations between the expression pattern of each pair of paralogs across 11
similar (n = 573, 777, two-sided, t-test). c, Enrichment of transcription at adult zebrafish tissues. f, Correlations between pairs of paralogs located on the
breakpoints (BP) that overlap with CTCF TAD boundaries in K562 cells same chromosome. Among them, 17 pairs were located within the same TAD,
(the number of breakpoints in blue line is 639, red line is 625). d, In 17 and the rest of the 65 pairs were located in different TADs. As a control, we
vertebrates, TADs without evolutionary breakpoints (bottom panel) have randomly sampled 100 genes. Number of each bar, from left to right, 17, 65, 100.
nature research | reporting summary
Corresponding author(s): Yue
Last updated by author(s): 07/05/2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see our Editorial Policies and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Deeptools 3.3.1
GREAT 3.0.0
Tophat2 2.1.0
Cufflinks 2.2.1
CPAT v1.2.3
TransDecoder V5.1.0 (https://transdecoder.github.io/)
blast/2.7.1
biomaRt 2.36.0
Cutadapt 2.5
1
picard 1.126 (http://broadinstitute.github.io/picard/).
ChIPseeker 1.19.1
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors and
reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
- A description of any restrictions on data availability
Next generation sequencing data have been deposited in Gene Expression Omnibus (GEO) under the following accession numbers :GSE134055
The genome browser link: https://epigenome.wustl.edu/zebrafishENCODE/
The human h1-ESC Hi-C data were downloaded from GSE52457.
GM12878 and K562 GRO-seq data were downloaded from GSE60456.
GM12878 and K562 CTCF ChIP-seq were downloaded from GSE31477.
GM12878 and K562 Pol2 ChIP-seq were downloaded from GSE91426 and GSE31477.
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Sample size The number of adult zebrafish used for each tissue were determined in order to obtain enough tissue for 4 ChIP-seq, RNA-seq, ATAC-seq,
WGBS and Hi-C experiments. In all, 160 datasets including 11 WGBS, 26 RNA-seq, 95 ChIP-seq, 22 ATAC-seq, 4 Hi-C and 2 single-cell ATAC-seq
were used in this study.
Data exclusions This is not relevant since we used all sequencing data in this study
Replication We have two replicates for each ChIP-seq (only one replicate for Kidney H3K27ac) , RNA-seq, ATAC-seq, Hi-C and single-cell ATAC-seq. We
performed two technical replicates for Whole Genome Bisulfite Sequencing (WGBS) and reached 30X coverage. We calculated the Pearson
2
correlation coefficient between two biological replicates using the reads counts of 10 kb-binned matrices. The correlation score of all
replicates were listed in supplemental Table 1.
Blinding Blinding was not relevant to our study since we did not have experimental groups to compare.
Antibodies
Antibodies used Rabbit polyclonal histone H3K27ac antibody (Active Motif, 39133) 1:100
Rabbit polyclonal histone H3K4me3 antibody (EMD Millipore, 07-473) 1:100
Rabbit monoclonal histone H3K9me2 antibody (Cell Signaling, 4658) 1:100
Rabbit polyclonal histone H3K9me3 antibody (Abcam, ab8898) 1:100
Validation The four primary antibodies are commercial antibodies against Histone H3 modifications, validated as ChIP
grade by the manufacturer (active motif, milipore, cellsignal and Abcam):
https://www.activemotif.com/catalog/details/39133
https://www.emdmillipore.com/US/en/product/Anti-trimethyl-Histone-H3-Lys4-Antibody,MM_NF-07-473
https://www.cellsignal.com/products/primary-antibodies/di-methyl-histone-h3-lys9-d85b4-xp-rabbit-mab/4658
https://www.abcam.com/histone-h3-tri-methyl-k9-antibody-chip-grade-ab8898.html
Furthermore, these antibodies have been validated by various labs for ChIP-seq with zebrafish and also many other species of which
have the exact same amino acid sequence (PMIDs: 31794598, 24286030, 31495082, 30948728)
Wild animals The study did not involve animals in the wild.
Field-collected samples The study did not involve animals collected from field.
Ethics oversight All procedures on live animals were approved by the Institutional Animal Care and Use Committee (IACUC) at the Pennsylvania State
University, ID: PRAMS201445659
Note that full information on the approval of the study protocol must also be provided in the manuscript.
ChIP-seq
Data deposition
April 2020
Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.
3
GSE134055_YueLab-ChIP-seq-Colon_H3K27ac_merged.narrowPeak.gz
GSE134055_YueLab-ChIP-seq-Colon_H3K4me3_merged.narrowPeak.gz
Methodology
Replicates Brain_RNA-seq 2 0.888
Blood_RNA-seq 2 946
Colon_RNA-seq 2 0.908
Heart_RNA-seq 2 0.849
Intestine_RNA-seq 2 0.853
E-brain_RNA-seq 2 0.911
April 2020
E-trunk_RNA-seq 2 0.875
Kidney_RNA-seq 2 0.91
Liver_RNA-seq 2 0.871
Muscle_RNA-seq 2 0.871
Skin_RNA-seq 2 0.916
Spleen_RNA-seq 2 0.852
Testis_RNA-seq 2 0.953
Brain_h3K27ac 2 0.97
Colon_h3K27ac 2 0.93
4
Blood_h3K27ac 2 0.91
Heart_h3K27ac 2 0.97
Testis_WGBS 1
Brain_HiC 2 0.981
Muscle_HiC 2 0.976
Brain_scATAC-seq 0.925
5
Intestine_RNA-seq_rep1 62,593,691 paired-ed 60
E-brain_RNA-seq_rep1 81,655,736 paired-ed 60
6
Skin_WGBS_rep1 429,245,446 paired-ed 150
Spleen_WGBS_rep1 409,400,141 paired-ed 150
7
Brain_HiC_rep2 546,389,510 paired-ed 150
Muscle_HiC_rep2 331,791,203 paired-ed 150
Antibodies Rabbit polyclonal histone H3K27ac antibody (Active Motif, 39133) Lot. 31814008 Dilution ratio. 1:100
Rabbit polyclonal histone H3K4me3 antibody (EMD Millipore, 07-473) clone MC135. Lot. 2591879 Dilution ratio. 1:100
Rabbit monoclonal histone H3K9me2 antibody (Cell Signaling, 4658) Lot. GR3247768-1 Dilution ratio. 1:100
Rabbit polyclonal histone H3K9me3 antibody (Abcam, ab8898) Lot. GR3247768-1 Dilution ratio.1:100
Peak calling parameters H3K27ac and H3K4me3: macs2 callpeak -f BED -n -p 1e-2 --nomodel --shift 0 --extsize 150 --keep-dup all -B --SPMR
ATAC-seq: macs2 callpeak --shift 75 --extsize 150_window --nomodel -B --SPMR --keep-dup all --call-summits
H3K9me3 and H3K9me2 : findPeaks -style histone -region -size 1000 -minDist 5000
20386 heart.h3k4me3xheart.input.peak_region.rpkm.filter_FC2_change_1
22039 intestine.h3k4me3xintestine.input.peak_region.rpkm.filter_FC2_change_1
26275 kidney.h3k4me3xkidney.input.peak_region.rpkm.filter_FC2_change_1
21490 liver.h3k4me3xliver.input.peak_region.rpkm.filter_FC2_change_1
23063 muscle.h3k4me3xmuscle.input.peak_region.rpkm.filter_FC2_change_1
18720 skin.h3k4me3xskin.input.peak_region.rpkm.filter_FC2_change_1
20429 spleen.h3k4me3xspleen.to.h3k4me3.input.peak_region.rpkm.filter_FC2_change_1
32333 testis.h3k4me3xtestis.input.peak_region.rpkm.filter_FC2_change_1
13171 brain_merge_scATAC.10_peaks.narrowPeak
8
123336 brain_merge_scATAC.11_peaks.narrowPeak
127338 brain_merge_scATAC.12_peaks.narrowPeak
April 2020
9
Article
Department of Cellular and Molecular Medicine, University of California San Diego, La Jolla, CA, USA. 2Howard Hughes Medical Institute, Chevy Chase, MD, USA. 3Institute of Pharmaceutical
1
Chemistry, Goethe-Universität, Frankfurt, Germany. 4Division of Biological Sciences, Molecular Biology Section, University of California San Diego, La Jolla, CA, USA. 5Small Molecule Discovery
Program, Ludwig Institute for Cancer Research, La Jolla, CA, USA. 6Division of Biological Sciences, Cell and Developmental Biology Section, University of California San Diego, La Jolla, CA,
USA. 7Present address: Genomics Institute of the Novartis Research Foundation, La Jolla, CA, USA. 8Present address: Department of Biological Sciences, Indian Institute of Science Education
and Research Mohali, Mohali, India. 9Present address: La Jolla Institute for Immunology, La Jolla, CA, USA. 10Present address: Sir William Dunn School of Pathology, Oxford University, Oxford,
UK. 11These authors contributed equally: C. K. Deniston, J. Salogiannis, S. Mathea. ✉e-mail: sreckpeterson@ucsd.edu; aleschziner@ucsd.edu
N-lobe
Kinase
N2081
Å
3
C-lobe
i j Electrostatic
rostatic Hydrophobic
4 int actions
interactions interactions
5
C-terminal
terminal
100°
00°
(WD40)
Kinase
helix
30°
C
WD40
Fig. 1 | Cryo-EM structure of LRRK2RCKW. a, Schematic of the construct used in with improved resolution for the ROC and COR-A domains. f, Ribbon diagram
this study. The N-terminal half of LRRK2, absent from our construct, is shown in of the atomic model of LRRK2RCKW. g, A 8.1 Å cryo-EM map of monomeric
dim colours. The same colour-coding of domains is used throughout the LRRK2RCKW with the model in f docked in. h, Location of the Parkinson’s and
Article. The five major familial mutations in Parkinson’s disease and a mutation Crohn’s disease mutations listed in a. i, j, Interface between the C-terminal
linked to Crohn’s disease are indicated. b, c, A 3.5 Å cryo-EM map (b) and local helix and the kinase domain in LRRK2RCKW with residues involved in
resolution (c) of the LRRK2RCKW trimer, with one monomer highlighted. electrostatic and hydrophobic interactions indicated.
d, e, A 3.8 Å cryo-EM map (d) and local resolution (e) of a LRRK2RCKW monomer
e, Extended Data Fig. 2a–d). The final model was generated using the Video 3). Although several other kinases have α-helices in the same gen-
signal-subtracted maps of the ROC and COR-A domains, and then com- eral location, none form interactions as extensive as those observed in
bined with the COR-B, kinase and WD40 domains from the trimer map LRRK2 (Fig. 1i, j, Extended Data Fig. 3d–i). Deletion of this helix resulted
(Fig. 1f, Extended Data Fig. 2e–m, Supplementary Video 1). Our model in an insoluble protein (Extended Data Fig. 1a, b). A residue near its end
fits well into an 8.1 Å reconstruction we obtained of a LRRK2RCKW mono- (T2524) is a known phosphorylation site for LRRK220. Owing to the close
mer (Fig. 1g, Extended Data Fig. 6), which indicates that trimer forma- proximity between T2524 and the N-lobe of the kinase domain, as well
tion does not cause major structural changes in the protein. as the adjacent COR-B domain, we hypothesize that phosphorylation
LRRK2RCKW adopts an overall J-shape, with the WD40, kinase and of this residue may be involved in regulation of the kinase. Because the
COR-B domains arranged along one axis, and COR-A and ROC turning last two residues of the C-terminal helix are disordered in our structure,
around back towards the kinase. This brings the COR-A and the tightly as is a neighbouring loop in COR-B, it is possible that conditions exist
associated ROC domain into close proximity to the kinase C-lobe (Fig. 1f, in which these regions become ordered and turn the C-terminal helix
Supplementary Video 1). This arrangement probably underpins the into a scaffolding element that connects COR-B, the kinase and the
crosstalk between the LRRK2 kinase and GTPase15,16. Part of the FERM WD40 domains.
domain in the FAK–FERM complex approaches the FAK C-lobe in a We modelled the leucine-rich repeats (LRR) into LRRK2RCKW by using
similar way17 (Extended Data Fig. 3a, b). The ROC, COR-A and COR-B a crystal structure of the LRR, ROC and COR domains of the Chlorobium
domains are arranged as seen in crystal structures of LRRK2 bacte- tepidum Roco protein7 (Extended Data Fig. 3l–p). In our model, the
rial homologues6,7,18. The N-lobe of the kinase domain, in particular LRR wraps around the N-lobe of the kinase and approaches the C-lobe,
its αC helix, forms an extensive interaction with the COR-B domain, placing the known S1292 autophosphorylation site in the LRR close to
with COR-B occupying a location similar to cyclin A in CDK2–cyclin the active site of the kinase, and the Crohn’s-disease-associated residue
A19 (Extended Data Fig. 3a, c). N208121, located in the kinase C-lobe, next to the LRR (Extended Data
The kinase in our LRRK2RCKW structure is in an open, inactive confor- Fig. 3q), suggesting the functional relevance of this predicted interface.
mation. Its activation loop contains the site of two familial mutations
found in Parkinson’s disease (G2019S and I2020T) and is disordered
beyond G2019 (Fig. 1h, Extended Data Fig. 2h, Supplementary Video 2). Model of microtubule-bound filaments
R1441 and Y1699 are the sites of three other familial Parkinson’s disease A 14 Å structure of microtubule-associated filaments of full-length
mutations and are located at the ROC and COR-B interface (Fig. 1h, LRRK2 (carrying the filament-promoting I2020T mutation12) was
Extended Data Fig. 2j, Supplementary Video 2). Because the kinase and recently determined using in situ cryo-ET and subtomogram anal-
GTPase interact with each other via the COR-A domain, it is possible ysis5 (Fig. 2a). The LRRK2 filaments formed on microtubules are
that these mutations, located at the interface between the GTPase right-handed5. Because microtubules are left-handed and no strong
and COR-B, alter the conformational landscape of LRRK2 in response density connected the LRRK2 filament to the microtubule surface5,
to ligands and/or regulatory signals and therefore affect the crosstalk it is not known whether the LRRK2 microtubule interaction is direct.
between the LRRK2 catalytic domains. To address this, we combined purified microtubules and LRRK2RCKW,
A unique feature of LRRK2 is a 28-amino-acid α-helix located at its either wild type or I2020T, and imaged them by cryo-EM. Both wild-type
extreme C terminus, after the WD40 domain (Fig. 1i, j, Supplementary and I2020T mutant LRRK2RCKW bound to microtubules, and diffraction
b d g
LRRK2RCKW LRRK2RCKW
Microtubule (WT) (I2020T)
Image
Diffraction
pattern
h i
Kinase WD40
WD40
WD40
WD40
Open
Closed
COR
COR
Fig. 2 | Modelling the microtubule-associated LRRK2 filaments. a, A 14 Å circle highlights the filament interface mediated by interactions between COR
cryo-ET map of a segment of microtubule-associated LRRK2 filament domains, where clashes are found. e, Superposition of the LRRK2RCKW structure
in cells. The microtubule is shown in blue and the LRRK2 filament in grey. (coloured by domains) and a model of LRRK2RCKW with its kinase in a closed
b, Microtubule-associated LRRK2RCKW filaments reconstituted in vitro from conformation in blue. The dashed blue arrow indicates the closing of the
purified components. Top, single cryo-EM images of a naked microtubule kinase. f, Fitting of the closed-kinase model of LRRK2RCKW into the cryo-ET map.
(left), and wild-type (WT) (centre) and I2020T (right) LRRK2RCKW filaments. g, Atomic model of the closed-kinase LRRK2RCKW filaments from f with a white
Bottom, diffraction patterns (power spectra) calculated from the images circle highlighting the same interface as in d. h, i, Cartoon representation of the
above. Filled and open arrowheads indicate the layer lines corresponding to the two filament models, highlighting the clashes observed with open-kinase
microtubule and LRRK2RCKW, respectively. Scale bar, 20 nm. c, Fitting of the LRRK2RCKW (h) and resolved with the closed-kinase model (i). In total, 82% of
LRRK2RCKW structure, which has its kinase in an open conformation, into the clashes were resolved using the closed-kinase LRRK2RCKW model (Methods).
cryo-ET map. d, Atomic model of the LRRK2RCKW filaments from c. The white
patterns calculated from the images showed layer lines consistent with LRRK2 kinase to be in a closed conformation. To test this, we modelled
the formation of ordered filaments (Fig. 2b). Therefore, the interaction a kinase-closed LRRK2RCKW (Fig. 2e, Extended Data Fig. 4g–j) and used
between LRRK2 and microtubules is direct and the catalytic C-terminal it to rebuild the LRRK2 filament. The kinase-closed LRRK2RCKW model
half of LRRK2 is sufficient for the formation of microtubule-associated resolved more than 80% of the backbone clashes we had observed
filaments. The layer line patterns of wild-type and I2020T mutant with our kinase-open LRRK2RCKW structure (Fig. 2c, d, f, g). A closed
LRRK2RCKW are different, with the I2020T diffraction pattern having an conformation for the kinase was also proposed by the integrative
additional layer line of lower frequency, which indicates longer-range modelling5. Given these data, we hypothesize that the conformation
order in the filaments (Fig. 2b). This is consistent with the observation of LRRK2 controls its ability to oligomerize on microtubules, with a
that the I2020T mutation promotes microtubule association by LRRK2 closed kinase promoting oligomerization and an open (inactive) one
in cells12. Understanding the structural basis for this effect will require disfavouring it (Fig. 2h, i).
high-resolution structures of the filaments formed by wild-type and The LRRK2 filaments in our kinase-closed model are formed by two
I2020T mutant LRRK2. homotypic interactions: one is mediated by the WD40 domain and the
Integrative modelling was previously used to build a model into the other by the COR-A and COR-B domains (Fig. 3a–d). Similar interfaces
in situ structure of microtubule-associated LRRK25. This modelling indi- were reported on the basis of the cryo-ET structure5. We also solved
cated that the well-resolved cryo-ET density closest to the microtubule structures of LRRK2RCKW dimers, using the same grids that yielded
consisted of the ROC, COR, kinase and WD40 domains and gave orienta- the 3.5 Å structure of LRRK2RCKW. We obtained structures of both the
tion ensembles for each domain5 that are in good agreement with our WD40–WD40- and COR–COR-mediated dimers, which indicates that
high-resolution structure of LRRK2RCKW (Extended Data Fig. 4a). Here, both interfaces mediate dimerization in the absence of microtubules
we built an atomic model of the microtubule-bound LRRK2 filaments (Fig. 3e, f, Extended Data Figs. 5, 6). The interface in the COR-mediated
by combining our 3.5 Å structure of LRRK2RCKW with the 14 Å in situ dimer of LRRK2RCKW differs from that reported for the C. tepidum Roco
structure of microtubule-associated LRRK2 (Extended Data Fig. 4b–f). protein6,7; although the GTPase domains interact directly in the dimer of
This showed that the LRRK2RCKW structure is sufficient to account for the bacterial protein7, they are not involved in the dimerization interface
the density seen in the in situ structure (Fig. 2c), in agreement with our we observed for LRRK2 (Extended Data Fig. 6c).
ability to reconstitute microtubule-associated LRRK2RCKW filaments We built an independent model of a closed-kinase LRRK2RCKW by
in vitro (Fig. 2b), and with the earlier modelling5. splitting our 3.5 Å structure in half at the junction between the N- and
Although our LRRK2RCKW structure fits the overall shape of the cryo-ET C-lobes of the kinase, and fitting the fragments into our cryo-EM map
map, there were notable clashes at the COR domain interfaces (Fig. 2d). of a WD40–WD40 dimer obtained in the presence of a LRRK2-specific
Because the kinase in our LRRK2RCKW structure is in an open confor- type I kinase inhibitor MLi-222,23, which is predicted to stabilize the
mation, we hypothesized that filament formation might require the closed conformation of the kinase (Extended Data Fig. 7a–c).
h
e f
Apo MLi-2 Apo MLi-2 90°
90° 90°
90° 90°
Fig. 3 | LRRK2RCKW forms WD40- and COR-mediated dimers outside the shows the cryo-EM map and the bottom row a transparent version of it with a
filaments. a–d, The filament model shown in Fig. 2h, i is shown here in grey, model docked in. g, Molecular models of the WD40-mediated and
with either a WD40-mediated (a), or COR-mediated (c) LRRK2RCKW dimer COR-mediated LRRK2RCKW dimers obtained in the presence of MLi-2 (e, f) were
highlighted with domain colours. The corresponding molecular models are aligned in alternating order. This panel shows the resulting right-handed helix.
shown next to the cartoons (b, d). e, f, Cryo-EM reconstructions of LRRK2RCKW h, The helix has dimensions that are compatible with the diameter of a
dimers obtained in the absence of inhibitor (‘apo’), or in the presence of MLi-2. 12-protofilament microtubule (EMD-5192)44, which was the species used to
For each reconstruction, two orientations of the map are shown: down the obtain the cryo-ET map shown in Fig. 2a5, and has its ROC domains pointing
two-fold axis at the dimerization interface (left), which matches the orientation towards the microtubule surface.
of the models shown in b and d, and perpendicular to it (right). The top row
We then docked this closed-kinase model (Extended Data Fig. 7c) into microtubule-associated proteins, such as MAP2 and Tau, also inhibit
the cryo-EM maps of WD40- and COR-mediated dimers obtained in kinesin, but not dynein26,27, probably owing to the ability of dynein to
the presence of MLi-2 to generate molecular models of both dimers side-step on the microtubule28–30. The unusual ability of LRRK2 to inhibit
(Extended Data Fig. 7d, e). We aligned these models to build a polymer dynein may be a consequence of it forming oligomers that cannot be
in silico. This resulted in a right-handed helix with the same general overcome by sidestepping.
geometric properties seen in the cellular LRRK2 filaments, which We also tested the inhibition of kinesin by I2020T mutant LRRK2RCKW,
indicates that those properties are largely encoded in the structure which promotes the formation of filaments when overexpressed in
of LRRK2RCKW itself (Fig. 3g, h, Extended Data Fig. 7f). Docking the cells12. I2020T mutant LRRK2RCKW inhibited kinesin to a similar extent
same two halves of LRRK2RCKW into the cryo-EM map of a monomer we as wild-type LRRK2RCKW (Extended Data Fig. 8c, d). Because the in vitro
obtained in the absence of inhibitors or ATP led to a structure simi- reconstituted filaments of I2020T mutant LRRK2RCKW show longer range
lar to our 3.5 Å structure obtained from trimers, further confirming
that trimer formation does not alter the conformation of LRRK2RCKW
(Extended Data Fig. 7g, h). a Dynein-TMR/dynactin/ b c
100 100
(% per microtubule)
(% per microtubule)
These data, along with the apparent lack of any residue-specific Kinesin-GFP
Kines ninein-like
Kinesin motility
Dynein motility
80 80
interactions between LRRK2 and the microtubule lattice, suggest that 60 60
****
the microtubule may provide a surface for LRRK2 to oligomerize on Microtubule 40 40
using interfaces that exist in solution, therefore explaining the sym- – + 20 **** 20
****
metry mismatch between the microtubule and the LRRK2 filament. Streptavidin 0 0
Biotin 0 6.25 12.50 25.00 0 25
Consistent with this, the surface charge of the microtubule facing the LRRK2RCKW [nM] LRRK2RCKW [nM]
LRRK2RCKW filament is acidic, whereas there are basic patches on the d e f
Kinesin Kinesin Dynein
LRRK2RCKW filament that face the microtubule (Extended Data Fig. 7i–l). 1.0 1.5 1.0
Relative frequency
Relative frequency
The unstructured C-terminal tails of α- and β-tubulin, which were not 1.0
****
included in the surface charge calculations, are also acidic. 0.5 [LRRK2RCKW], tau 0.5
0 nM, 1.667 μm
6.25 nM,1.570 μm 0.5 [LRRK2RCKW], tau
12.5 nM, 1.048 μm 0 nM, 4.980 μm
25 nM, 0.813 μm 25 nM, 0.846 μm
0.0
LRRK2RCKW inhibits kinesin and dynein 0.0
0 5 10 15
0.0
0 6.25 12.50 25.00 0 20 40
Run length (μm) LRRK2RCKW [nM] Run length (μm)
To test our hypothesis that the conformation of the LRRK2 kinase
domain regulates its interaction with microtubules, we needed a sen- Fig. 4 | LRRK2RCKW inhibits the motility of kinesin and dynein. a, Schematic of
sitive assay to measure the association of LRRK2RCKW with microtubules the single-molecule motility assay. b, c, The percentage (mean ± s.d.) of motile
and a means to control the conformation of its kinase. We monitored events per microtubule as a function of LRRK2RCKW concentration for kinesin (b)
microtubule association by measuring the effect of LRRK2RCKW on and dynein (c). ****P < 0.0001, Kruskal–Wallis test with Dunn’s post hoc for
microtubule-based motor motility. We used a truncated human kine- multiple comparisons for (b) or Mann–Whitney test (c). d, Cumulative frequency
sin 1, KIF5B (‘kinesin’)24, which moves towards the microtubule plus end, distribution of kinesin run lengths as a function of LRRK2RCKW concentration.
and the activated human cytoplasmic dynein-1–dynactin–ninein-like Mean decay constants (tau) are shown. The 12.5 nM and 25 nM, but not 6.25 nM,
conditions were significantly different (P < 0.0001) than the 0 nM condition
complex (‘dynein’)25, which moves in the opposite direction. Using
(one-way analysis of variance (ANOVA) with Dunnett’s test for multiple
single-molecule in vitro motility assays (Fig. 4a), we found that low
comparisons using error generated from a bootstrapping analysis). e, Velocity of
nanomolar concentrations of LRRK2RCKW inhibited the movement
kinesin as a function of LRRK2RCKW concentration. Data are mean ± s.d.
of both kinesin and dynein, with near complete inhibition at 25 nM ****P < 0.0001, one-way ANOVA with Dunn’s post hoc for multiple comparisons.
LRRK2RCKW (Fig. 4b, c, Extended Data Fig. 8a, b). We hypothesized that f, Cumulative frequency distribution of dynein run lengths as a function of
LRRK2RCKW was acting as a roadblock for the motors. In agreement LRRK2RCKW concentration. Mean decay constants (tau) are shown. Data were
with this, the distance that single kinesins moved (run length) was resampled with bootstrapping analysis and were significant. P < 0.0001,
reduced (Fig. 4d), whereas their velocity remained relatively con- unpaired t-test with Welch’s correction using error generated from a
stant (Fig. 4e). We obtained similar results with dynein (Fig. 4f). Other bootstrapping analysis.
60 COR
40 ****
****
20
******** ******** WD40
0 WD40
0 25 50 0 25 50 0 25 50 0 25 50 0 25 50 WD40
LRRK2RCKW [nM] Kinase
b c d COR
Closed
DMSO Pon. GZD MLi-2 IN-1 COR
Dynein motility per MT (%)
100 60 50
**** COR
filaments (%)
filaments (%)
40
60 30
20
40 20 WD40
20 *** **** 10
****
0 0 0
SO
M O
0 25 0 25 0 25 0 25 0 25
-2
10
5
S
GZD (μM)
Li
DM
DM
LRRK2RCKW [nM]
Fig. 5 | Type II, but not type I, kinase inhibitors rescue kinesin and dynein of wild-type GFP–LRRK2 filaments in 293T cells. Data are mean ± s.d.
motility and reduce LRRK2 filament formation in cells. a, b, Effects of ****P = 0.0002, Mann–Whitney test. d, Treatment with GZD-824 (5 μM) for
different kinase inhibitors on LRRK2RCKW’s inhibition of kinesin (a) and dynein 30 min decreases the formation of GFP–LRRK2(I2020T) filaments in 293 cells.
(b) motility. Data shown are the percentage of motile events per microtubule Data are mean ± s.d. *P = 0.0133, **P = 0.0012, Kruskal–Wallis with Dunn’s
(MT) as a function of LRRK2RCKW concentration in the absence (DMSO) or post hoc test for multiple comparisons. e, Schematic of our hypothesis. The
presence of the indicated inhibitors (ponatinib (Pon.) and GZD-824 (GZD): LRRK2 kinase can be in an open or closed conformation. The different species
10 μM; MLi-2 and LRRK2-IN-1 (IN-1): 1 μM). Data are mean ± s.d. ***P < 0.001, we observed are represented in the rounded rectangles, but only monomers
****P < 0.0001, Kruskal–Wallis test with Dunn’s post hoc for multiple are shown on the microtubule for simplicity. Our model proposes that the
comparisons within drug only. DMSO conditions reproduced from Fig. 4c for kinase-closed form of LRRK2 favours oligomerization on microtubules.
comparison. c, Treatment with MLi-2 (500 nM) for 2 h increases the formation
Extended Data Fig. 5 | Ab initio models for cryo-EM of LRRK2RCKW dimers filtered to 30 Å resolution. e, Projections of the volumes in d shown in the same
and cryo-EM analysis of WD40- and COR-mediated dimers of LRRK2RCKW in order as their corresponding 2D class averages in b. f, Data processing strategy
the presence of the inhibitor MLi-2. a, An initial dataset was collected from a for obtaining cryo-EM structures of WD40- and COR-mediated dimers of
sample of LRRK2RCKW incubated in the presence of the kinase inhibitor MLi-2 LRRK2RCKW in the presence of the inhibitor MLi-2. The models used during this
and dimers were selected. b, Representative two-dimensional class averages processing (Methods) are those shown in d along with an additional linear
used for ab initio model building. c, Ab initio models with the structure of trimer (Methods) used for particle sorting.
LRRK2RCKW docked in. d, Volumes generated form the molecular models in b,
Extended Data Fig. 6 | See next page for caption.
Article
Extended Data Fig. 6 | Cryo-EM analysis of a monomer and WD40- and two dimers shown in Fig. 3 are shown on the left but in orientations similar to
COR-mediated dimers of LRRK2RCKW in the absence of inhibitor (apo) and those represented by the 2D class averages shown here. For each class average,
dimerization of LRRK2RCKW outside the filaments. a, Data-processing a projection from the corresponding model in the best-matching orientation is
strategy for obtaining cryo-EM structures of a monomer and WD40- and COR- shown to its left. c, Two copies of the LRRK2RCKW structure were aligned to the
mediated dimers of LRRK2RCKW in the absence of inhibitor. The models used ROC–COR domains of the LRR–ROC–COR structure from the C. tepidum Roco
during the processing of the dimers (Methods) are those shown in Extended protein (PDB code 6HLU) to replicate the interface observed in the bacterial
Data Fig. 5d, along with an additional linear trimer (Methods) used for particle homologue in the context of the human protein. This panel shows a
sorting. The models used for processing of the monomer (Methods) were the comparison between the dimer modelled based on the C. tepidum LRR–ROC–
same dimer models as in Extended Data Fig. 5d (used for particle sorting) in COR structure and the dimer observed for LRRK2RCKW in this work. Although the
addition to a monomer model generated from our LRRK2RCKW model (used for bacterial structure shows a dimerization interface that involves the GTPase
refinement). b, Two-dimensional (2D) class averages of WD40- and COR- (ROC), LRRK2RCKW interacts exclusively through its COR-A and -B domains, with
mediated LRRK2RCKW dimers obtained in the absence of inhibitors (apo) or in the ROC domains located away from this interface. The two arrangements are
the presence of either ponatinib or MLi-2. The same molecular models of the shown schematically in cartoon form below the structures.
Extended Data Fig. 7 | See next page for caption.
Article
Extended Data Fig. 7 | Properties of the microtubule-associated LRRK2RCKW into the MLi-2 WD40-mediated dimer map (c) (dark blue) and apo monomer
filaments. a, b, The LRRK2RCKW structure solved in this work (a) was split at the map (g) (light blue). The three structures were aligned using the C-lobes of
junction between the N- and C-lobes of the kinase domain (L1949-A1950) (b). their kinases and the WD40 domain. The superposition illustrates that the
c, Docking of the two halves of LRRK2RCKW into a cryo-EM map of a LRRK2RCKW docking into the apo map results in a structure very similar to that obtained
dimer solved in the presence of MLi-2. The dimer map is the same one shown in from the trimer (Fig. 1) and that the presence of MLi-2 leads to a closing of the
Fig. 3 and Extended Data Figs. 10 and 11. d, The model obtained in c was docked kinase. i, Molecular model of the microtubule-associated LRRK2RCKW filament
into cryo-EM maps of either WD40- or COR-mediated dimers obtained in the obtained by docking a fragment of a microtubule structure (PDB code 6O2S)
presence of MLi-2. e, Molecular models resulting from the docking in d. into the corresponding density in the sub-tomogram average (Fig. 2a). j, Same
f, Aligning, in alternating order, copies of the dimer models generated in d and view as in i with the models shown as surface representations coloured by their
e results in a right-handed filament with dimensions compatible with those of a Coulomb potential. k, l, ‘Peeling off’ of the structure shown in j, with the
microtubule, and its ROC domains pointing inwards (see Fig. 3g, h for more LRRK2RCKW filament seen from the perspective of the microtubule surface (k)
details). g, Docking of the two halves of LRRK2RCKW into a cryo-EM map of a and the microtubule surface seen from the perspective of the LRRK2RCKW
LRRK2RCKW monomer solved in the absence of inhibitor (apo). The map is the filament (l). Note that the acidic C-terminal tubulin tails are not ordered in the
one shown in Fig. 1g and Extended Data Fig. 6. h, Three-way comparison of microtubule structure and are therefore not included in the surface charge
LRRK2RCKW (with domain colours) and the models resulting from the dockings distributions. The Coulomb potential colouring scale is shown on the right.
Extended Data Fig. 8 | Inhibition of motor motility by wild-type and I2020T I2020T mutant LRRK2RCKW. Data are mean ± s.d. (n = 12 microtubules per
mutant LRRK2RCKW. a, Example kymographs showing that increasing condition quantified from two independent experiments). There is a
concentrations of LRRK2RCKW reduce kinesin runs. b, Example kymographs significant difference between 0 nM and both 25 nM RCKW conditions
showing that 25 nM LRRK2RCKW reduces dynein runs. c, Representative (P < 0.0001), but no significant (ns) difference between the inhibitory effects of
kymographs of kinesin motility in the presence or absence of wild-type and wild-type LRRK2RCKW versus I2020T mutant LRRK2RCKW as calculated using the
I2020T mutant LRRK2RCKW. d, The percentage of motile kinesin events per Kruskal–Wallis test with Dunn’s posthoc for multiple comparisons (compared
microtubule in the absence of LRRK2 or in the presence of 25 nM wild-type or to no LRRK2RCKW).
Article
The model refinement statistics are reported for four different types of model, two including GDP-Mg2+ in the ROC domain and two excluding it. In each case, we report statistics for two types
of model: ‘Monomer w/interfaces’ consists of an LRRK2RCKW monomer plus fragments from the neighbouring monomers in the C3 trimer that were used during model building and refinement;
‘Top 10 monomers’ are the top-10 results from Rosetta Relax with the neighbouring fragments removed after processing in Rosetta. PDB accession numbers for the models and the EMD code
of the maps used for model-building and refinement are indicated. EMD-21250 contains both the C3 map of the LRRK2RCKW trimer used to build the COR-B, kinase and WD40 domains and the
signal-subtracted monomer used to build the ROC and COR-A domains. The final models reported here were refined into the signal-subtracted monomer map (Methods) *C3 reconstruction.
#
Signal-subtracted monomer.
a
WD40-mediated dimer.
b
COR-mediated dimer.
**Numbers represent the average of the values for all 10 models.
nature research | reporting summary
Samara L Reck-Peterson and Andres E
Corresponding author(s): Leschziner
Last updated by author(s): Aug 6, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis For electron microscopy experiments, data was processed with Appion, GCTTF, MotionCor2, FindEM, Cryolo, Relion3, and Cryosparc2. All
the electron microscopy data processing software sources are referenced in the methods section. For light microscopy experiments, data
was analyzed with ImageJ and used to make image z-maximum projections and kymographs. Graphpad Prism8 were used for all
statistical analysis of light microscopy data. For Western blots, data was quantified using EmpiriaStudio software (Li-COR).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
October 2018
For electron microscopy experiments, maps and coordinates are deposited on the PDB and EMDB. No image sets or particle stacks will be made available. All the
raw data that went into the biochemical and cell biological analyses for Figures 4 and 5 (and associated Extended Data Figures) were deposited in a spreadsheet
with the manuscript.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Data exclusions For single molecule kinesin experiments, clear bright aggregates (less than 5% of runs) were excluded from the analysis, as these runs display
longer run lengths than typical single-molecule kinesin runs (Brouhard, 2010, Methods Cell Biol). No conclusions change with the addition or
exclusion of these aggegrates, and we would be happy to provide the data without exclusion of aggregates if deemed necessary.
Replication All single molecule experiments in Figure 4 and 5 (including dynein and kinesin data with or without drugs) were performed with at least two,
but up to four technical replicates on different days (except EDF10g,h that was only performed with one technical replicate). Major findings
with kinesin and dynein single molecule data have been confirmed by two different protein preps. All cellular data from Figure 5 was
quantified from at least four, but up to ten technical replicates (defined in the Methods as at least 20 cells per coverslip) and independent
experiments were performed on multiple days as outlined in the Methods section.
Randomization This is not relevant. We have no data involving organisms or subjects that would require randomization.
Blinding For the cell biology data in Fig 5c-f, experimenter was blinded to conditions for both the imaging acquisition and analysis of LRRK2 filaments.
Antibodies
Antibodies used mouse anti-GFP (Santa Cruz, clone: B-2, Cat: sc-9996, Lot: ); chicken anti-GFP (AvesLabs, Cat: GFP-1010, Lot: GFP879484); rabbit
anti-alpha-tubulin (ProteinTech, Cat: 11224-1-AP); mouse anti-GAPDH (ProteinTech, Cat: 60004-1-Ig)
Validation All antibodies used are well-validated and highly-specific commercially available antibodies. For LiCOR quantification, linear
Mycoplasma contamination Every new cell line we receive is tested for mycoplasma before expanding and freezing. After thawing, each cell line is tested
again. Once every three months, our lab tests all growing cells for mycoplasma as well. The cells we used in our experiments
were last test on 10/16/19 and did not contain contamination.
2
Article
https://doi.org/10.1038/s41586-020-2875-7 Zheng Ruan1,4, James Osei-Owusu2,4, Juan Du1, Zhaozhu Qiu2,3 ✉ & Wei Lü1 ✉
Accepted: 14 August 2020 The proton-activated chloride channel (PAC) is active across a wide range of
Published online: 4 November 2020 mammalian cells and is involved in acid-induced cell death and tissue injury1–3. PAC
has recently been shown to represent a novel and evolutionarily conserved protein
Check for updates
family4,5. Here we present two cryo-electron microscopy structures of human PAC in
a high-pH resting closed state and a low-pH proton-bound non-conducting state.
PAC is a trimer in which each subunit consists of a transmembrane domain (TMD),
which is formed of two helices (TM1 and TM2), and an extracellular domain (ECD).
Upon a decrease of pH from 8 to 4, we observed marked conformational changes in
the ECD–TMD interface and the TMD. The rearrangement of the ECD–TMD interface
is characterized by the movement of the histidine 98 residue, which is, after
acidification, decoupled from the resting position and inserted into an acidic pocket
that is about 5 Å away. Within the TMD, TM1 undergoes a rotational movement,
switching its interaction partner from its cognate TM2 to the adjacent TM2. The
anion selectivity of PAC is determined by the positively charged lysine 319 residue
on TM2, and replacing lysine 319 with a glutamate residue converts PAC to a
cation-selective channel. Our data provide a glimpse of the molecular assembly of
PAC, and a basis for understanding the mechanism of proton-dependent activation.
Acidic pH is crucial for the function of intracellular organelles in the cryo-EM structures of PAC reconstituted in lipid nanodiscs at pH 8
secretory and endocytic pathways. It is also one of the pathological (pH8-PAC) and pH 4 (pH4-PAC) with estimated resolutions of 3.60
hallmarks of many diseases, including cerebral and cardiac ischaemia, and 3.73 Å, respectively (Extended Data Figs. 1a–c, 2, 3, Extended Data
cancer, infection and inflammation. The activity of PAC is stimulated by Table 1). The maps were of sufficient quality to carry out de novo model
the lowering of extracellular pH and has been recorded in a wide range building of the protein (Fig. 1, Extended Data Fig. 4a–d, Extended Data
of mammalian cells1. By mediating the influx of Cl− and subsequent cell Table 1). The cytoplasmic N and C termini (residues 1–60 and 339–350
swelling, PAC is implicated in acid-induced cell death2,3. We and others in pH8-PAC and 1–52 and 340–350 in pH4-PAC) are disordered in our
recently used unbiased RNA interference screens4,5 to identify a novel cryo-EM maps.
gene, PACC1 (also known as TMEM206), that encodes the PAC channel.
Our functional studies revealed that PAC has a key role in acid-induced
neuronal cell death in vitro and in ischaemic brain injury in mice4,6. Overall architecture
With no obvious sequence homology to other membrane proteins, PAC is a trimer. It has a small, ball-shaped ECD sitting on top of a
PAC represents a completely new family of ion channels4,5. PAC is highly slim and elongated TMD that contains two transmembrane heli-
conserved across vertebrates and is predicted to have two transmem- ces (TM1 and TM2) in each subunit (Fig. 1a, b, e, f). This trimeric
brane helices4,5, similar to the acid-sensing ion channel (ASIC) and the two-transmembrane-helix architecture is reminiscent of ASIC
epithelial sodium channel (ENaC)7,8. Although the structure and func- (Extended Data Fig. 5a–e) and ENaC7,8. The ECD of PAC is heavily gly-
tion of ASIC have been extensively studied7,9–12, the architecture of PAC cosylated, with four N-glycosylation sites in each subunit (Fig. 1c, g)—
and the mechanisms that underlie its pH sensing and anion selectivity consistent with a previous report5 and a deglycosylation assay
are unknown. To address these questions, we determined structures of (Extended Data Fig. 1d).
human PAC using single-particle cryo-electron microscopy (cryo-EM) Alkaline and acidic pH yielded two PAC structures with distinct
combined with patch-clamp electrophysiological studies. shapes—pH4-PAC is shorter and bulkier than pH8-PAC, and they differ
mainly at the TMD and the ECD–TMD interface. At pH 8, the TM1 helix
runs nearly parallel to and forms interactions only with its cognate TM2
Structural determination (Fig. 1b, d). When the pH drops to 4, TM1 switches its interaction from
PAC is activated at a pH below 5.5 at room temperature, and is maxi- its cognate TM2 to the adjacent TM2 (Fig. 1f, h). This domain-swapped
mally stimulated by protons at around pH 4.6–41. We determined movement of TM1 has not, to our knowledge, been observed in any
Department of Structural Biology, Van Andel Institute, Grand Rapids, MI, USA. 2Department of Physiology, Johns Hopkins University School of Medicine, Baltimore, MD, USA.
1
3
Solomon H. Snyder Department of Neuroscience, Johns Hopkins University School of Medicine, Baltimore, MD, USA. 4These authors contributed equally: Zheng Ruan, James Osei-Owusu.
✉e-mail: zhaozhu@jhmi.edu; wei.lu@vai.org
N TM1
TM2 N
TM1 TM2
18.4 Å ECD–TMD 57.7 Å
interface NAG C
TM1–β1
linker H98 (N148) NAG NAG
(N190) (N162)
β14–TM2 TM1 10.4 Å
NAG
TMD linker TM2 (N155)
27.8 Å ECD 30.2 Å TM2 TM1
C N
e f 90° g 180° h
Disulfide
bond TM2 TM1
N
TM1
TM2 C
TM2
28.9 Å
H98 50.6 Å TM1
TM1
TM1–β1
Pre-TM2
linker TM2
TM2
TM1
25.9 Å ECD 8.9 Å
30.8 Å
N
Fig. 1 | Overall architecture of PAC. a, e, Cryo-EM maps of pH8-PAC and glycosylation sites (Asn148, Asn155, Asn162 and Asn190) are labelled in c. The
pH4-PAC viewed parallel to the membrane. The map refined without using a distances between the centre of mass of the ECDs of each subunit in pH8-PAC
mask is shown as a transparent envelope. The horizontal dimension of the ECD– (c) and pH4-PAC (g) are shown by the triangles. NAG, N-acetyl-d-glucosamine.
TMD interface is represented by the distance between the Cα atoms of adjacent d, h, The TMDs of pH8-PAC and pH4-PAC viewed from the intracellular side. A
His98 residues. The density for His98 is coloured in yellow in the pH8-PAC (a) light salmon arrow in d indicates the rotation of TM1 of PAC after acidification
and pH4-PAC (e) maps. b, f, Atomic models of pH8-PAC and pH4-PAC. The green to pH 4. The relative position and distance of TM1 and TM2 in pH8-PAC (d) and
subunit is shown as a cartoon and the other two subunits are shown in surface pH4-PAC (h), which are represented by the Cα atoms of Iso73 and Lys319,
representation. The distances between the centre of mass of the ECD and of the respectively, are shown at the bottom. The double-headed arrow indicates the
TMD in pH8-PAC (b) and pH4-PAC (f) are shown on the right. c, g, The ECDs of interaction between TM1 and TM2.
pH8-PAC and pH4-PAC viewed from the extracellular side. Four putative
other two-transmembrane-helix channels,12,13 implying a novel gating structure that occupies the peripheral region of the ECD. The con-
mechanism. nection between the ECD and the TMD is achieved by the TM1–β1 and
The ECD–TMD interface consists of part of the TM1 helix on the extra- β14–TM2 linkers, which together form the wrist domain.
cellular side and two linkers that connect the TMD and the ECD—the The TMD consists of two transmembrane helices, TM1 and TM2, at the
TM1–β1 linker and the β14–TM2 linker. This interface differs substan- N terminus and C terminus of the protein, respectively. TM1 contains
tially between the two PAC structures (Fig. 1b, f). At pH 8, TM1 and the mostly hydrophobic residues and makes direct contacts with the lipid
short TM1–β1 linker hold the adjacent ECD through an ‘anchor’ residue, bilayer. TM2 contains both hydrophilic and hydrophobic residues and
His98, while the β14–TM2 linker is extended as a loop close to the pore lines the ion-conducting pore. Although the ECD mostly maintains
axis (Fig. 1b). At pH 4, the β14–TM2 linker is remodelled into a short its conformation in both pH states, the ECD–TMD interface and TMD
pre-TM2 helix, and the TM1–β1 linker moves outward, causing a verti- differ substantially at pH 4 and pH 8, characterized by the distinct con-
cal compression and an expansion of the ECD–TMD interface (Fig. 1a, formations of His98 and TM1 (Extended Data Fig. 4e, f). At pH 8, TM1
b, e, f, Supplementary Video 1). In tandem with the rearrangement of is approximately parallel to TM2, whereas at pH 4, the two transmem-
the ECD–TMD interface, the ECD at pH 4 shows a vertical movement brane helices form an angle of 64°.
towards the TMD and a contraction towards the pore axis, resulting in The TM2 helix of PAC is a continuous α-helix and differs from the
a shorter overall structure and a more compact ECD in comparison to ASIC TM2, which has a characteristic two-segment structure and a
that at pH 8 (Fig. 1b, c, f, g). Gly-Ala-Ser belt10 (Extended Data Fig. 5b, e). The ECD of PAC shows
strong similarities to the β-sheet core of the ECD in ASIC, despite sharing
limited protein-sequence identity (Extended Data Figs. 5e, 6). Notably,
Structure of single subunits the PAC ECD lacks the large exterior helical structures of the ASIC ECD,
At pH 8, each PAC protomer adopts an arm-like structure, with the which are involved in pH sensing14 (Extended Data Figs. 5a–d, 6), so PAC
ECD as the hand, the ECD–TMD interface as the wrist and the TMD as must have a different pH-sensing mechanism.
the forearm (Extended Data Fig. 4e). The hand-like ECD is composed
of a palm, a finger, a thumb and a β-ball domain, all of which consist
of β-strands except for the thumb domain, which contains two short Channel assembly
α-helices (Extended Data Fig. 4e–g). The finger and the β-ball domains The major interactions between PAC subunits occur at the ECD, the
are connected by a disulfide bond (Cys128–Cys149), forming a rigid ECD–TMD interface and in the upper part of the TMD. The lower part
Fig. 2 | Intersubunit interfaces. Each row represents the same view of from the extracellular side. At pH 8, Met101 at the beginning of the β1 strand is
pH8-PAC (top row; a–e) and pH4-PAC (bottom row; f–j). a, f, The overall in the centre of the lower ECD (c, bottom right). At pH 4, the lower ECD
structure of PAC shown in cartoon and surface representation. The ECD is undergoes a clockwise inward rotation so that Val103 in the middle of the β1
divided into the upper ECD and lower ECD for discussion. b, g, The upper ECD strand moves to the centre of the lower ECD (h, bottom right). d, i, The ECD–
viewed from the extracellular side. Phe196, which mediates the intersubunit TMD interface viewed parallel to the membrane. e, j, The interaction interfaces
interaction in the upper ECD, is shown as spheres. c, h, The lower ECD viewed at the TMD.
of the TMD lacks extensive interactions and is thus flexible. Analysis interface remains mostly unchanged, but the TM1–TM2 interface slips
of the conformational changes at the ECD revealed a rigid-body con- towards the intracellular side as a result of the domain-swapped move-
traction of the entire ECD and an iris-like rotation of the lower ECD ment of TM1.
(Supplementary Video 1). We thus looked at both the upper ECD (finger
and β-ball domains) and the lower ECD (palm and thumb domains) to
study their intersubunit interfaces (Fig. 2a, f). Ion-conducting pathway and selectivity
At pH 8, both the upper and the lower ECD have loose intersubu- The PAC channel has a central pore along the symmetry axis with wide
nit interfaces with an obvious gap between subunits (Fig. 2b, c). The openings at the extracellular and intracellular ends in both pH states
upper ECD has two major intersubunit interactions. The first is formed (Fig. 3a–d). Within the TMD, the ion-conducting pore is lined by TM2
between the adjacent β8 and β12 strands that run approximately paral- and the β14–TM2 linker (Fig. 3b, d). At pH 8, the ion-conducting pore is
lel to each other. The second is formed between the β6–β7 linker and occluded at several positions (Fig. 3e, g), thus representing a high-pH
the adjacent finger domain, where the Phe196 residue on the β6–β7 resting closed state. At pH 4 (Fig. 3f, g), the intracellular part of the pore
linker is inserted into the finger domain, forming both hydrophobic has an enlarged radius of 0.82 Å, but is still not wide enough to allow
and cation–π interactions (Fig. 2b, g, Extended Data Fig. 1e). Substitu- the permeation of Cl− ions, thus representing a low-pH protonated
tion of Phe196 with an alanine residue (F196A) yielded a misassembled non-conducting state. We found that PAC exhibits a strong outward
mutant protein that exhibited a markedly decreased channel activity rectification such that either the open probability or the single-channel
compared to the wild type (Extended Data Fig. 1f, g). This suggests that conductance is low at 0 mV, the voltage at which the cryo-EM structures
the upper ECD has an important role in channel assembly. The lower were determined (Fig. 3h). Moreover, PAC showed a marked desensiti-
ECD has a single major interface at the centre, where the three Met101 zation after prolonged treatment with a pH 4 solution (Extended Data
residues on the N terminus of the β1 strand tightly interact with each Fig. 7a–f). Therefore, we suggest that a closed pore in the pH4-PAC
other. This interface disconnects the central pore from the ECD to the structure represents either a pre-open state or a desensitized state.
TMD (Fig. 2c). At pH 4, the gaps in both the upper and the lower ECDs are To reveal the molecular determinants that are responsible for the
mostly filled, creating extensive interactions between subunits (Fig. 2g, anion selectivity of PAC, we examined the positively charged residues
h). Moreover, in the centre of the lower ECD, owing to the iris-like rota- within the ion-conducting pore, all of which are located in the intracel-
tion of the ECD from pH 8 to pH 4, the Val103 residue in the middle of lular half of the TM2 helix. The Lys319 residue appears to be an ideal
the β1 strand now mediates the contact (Fig. 2h). candidate, because it is immediately below the intracellular restric-
The intersubunit contact at the ECD–TMD interface is mediated tion site (Leu315) and forms a positively charged ‘triad’ around the
through His98 at the TM1–β1 linker. At pH 8, His98 is surrounded by intracellular entry point (Fig. 3e, f). In line with this hypothesis, we
hydrophilic and hydrophobic residues of the β1 and β14 strands of the found that a charge-reversing mutation (K319E) converted PAC from
adjacent ECD, constituting a resting ECD–TMD interface (Fig. 2d). At an anion-selective to a cation-selective channel with a pronounced
pH 4, this interface is remodelled; His98 interacts with a pocket formed inwardly rectifying current (Fig. 3h–j, Extended Data Fig. 8a, b). By con-
by residues in the β10–β11 linker of its cognate ECD and in the β1–β2 trast, mutation of two other lysine residues, K325E and K329E, resulted
linker of the adjacent ECD (Fig. 2i). Because this pocket is constructed in mutant proteins that behaved similarly to wild-type PAC (Extended
solely of negatively charged residues, we call it an ‘acidic pocket’. Data Fig. 8a–e). The crucial role of Lys319 is further supported by its
At the TMD, PAC has two major intersubunit interfaces (Fig. 2e, j): the conservation across species (Extended Data Fig. 6), and by the fact that
TM1–TM2 interface and the TM2–TM2 interface. At pH 8, both inter- PAC(K319C) is not functional5. Together, our data provide evidence that
faces are near the extracellular part of the TMD. At pH 4, the TM2–TM2 Lys319 is the determinant of anion selectivity for PAC.
β12 E250′
E250’ 4.6 Å
β12 Palm Q296
Vestibule H98’
H98′ TM1′
TM1’
Vestibule
β14 H98′ H98 TM2′
TM2’
Fenestration β14 Acidic
Fenestration H98 TM1
β1 β1 pocket Resting
β14–TM2 β14–TM2 interface TM2
TM1’
TM1′ TM2
linker linker TM1’
TM1′ TM1 22.8°
K319 TM2′ TM1
TM2’ 47.4°
TM2 TM2
Normalized current
e f g H98A 1.2 × 10–7
6.0
M101 ECD–TMD ECD–TMD
ECD–TMD ECD–TMD 6.6 × 10–8
V103 Q296A
seal
seal seal
pH50
V100 0.5 E107R 5.5
Fenestration
T300 Fenestration N302
Upper
Upper 5.0
N305 gate
gate
N305 0 4.5
A308 7.0 6.0 5.0 4.0 WT H98R H98A Q296A E107R
L309 G312 pH
G312 Lower
L315 gate Fig. 4 | Mechanisms of pH sensing and channel activation. a, Superposition
L315 pH 8
K319 of a single subunit of pH8-PAC (blue) and pH4-PAC (red) aligned using the ECD
10 Å pH 4
K319
palm domain. The 3 Å centre-of-mass distance indicates the rigid-body
1 2 3 4
Pore radius (Å) movement of the ECD. b, Close-up view of the conformational change in the
WT K319E
j ECD–TMD interface in a. Structural elements and residues in the pH4-PAC
h 2 1 i WT K319E 6.1 0.05
I (nA)
8 structure are labelled with a prime symbol. Residues from adjacent subunits
–100 30
I (nA)
1 are coloured in bright and light colours, respectively. At pH 8, His98 interacts
PCl/PNa
Vrev (mV)
100
0 with Gln296. At pH 4, the side chain of His98 interacts with an acidic pocket.
–100 V (mV) 4
–1
100 –30 c, Comparison of the TMD viewed from the intracellular side. The structures of
V (mV) 150 mM
–1 –2 15 mM –60
pH8-PAC (blue) and pH4-PAC (red) are aligned using the ECD. d, pH dose–
WT K319E
response curve of wild-type PAC and various PAC mutants. Data are
Fig. 3 | Ion-conducting pathways and anion selectivity. a, c, pH8-PAC (a) and mean ± s.e.m. of the current at 100 mV, normalized to pH-4.6-induced current
pH4-PAC (c) in surface representation, coloured according to the electrostatic (n = 10 (wild type, H98R, E107R), n = 9 (H98A) and n = 11 (Q296A)). The Hill
surface potential from −3 to 3 kT/e (red to blue). Titratable residues are coefficients for wild-type PAC and PAC(E107R) are 2.44 ± 0.18 and 1.18 ± 0.19
assigned to their predominant protonation state at pH 8 (a) or pH 4 (c) based on (mean ± s.e.m.), respectively. e, pH50 value estimated from the pH dose–
PROPKA. b, d, The pore profiles of pH8-PAC (b) and pH4-PAC (d) models along response curve. The centre and error bar represent the estimated pH50 value
the symmetry axis. Pore-lining residues are shown. e, f, Enlargements of the and s.e.m. from the nonlinear fitting in d. A one-way analysis of variance
boxed areas in b (e) and d (f), respectively. The positions of the ECD–TMD seal (ANOVA) with Bonferroni post-hoc test was used to determine the significance
and fenestration site are labelled. g, Pore radius plots of the profiles in e, f. (P values are indicated).
h, The representative current (I)–voltage (V) relationship for wild-type (WT)
PAC and PAC(K319E). The pipette solution contains 150 mM NaCl; the bath
solution contains 150 mM (black) or 15 mM (red) NaCl. i, The reversal potential the pH4-PAC structure are hydrated, whereas those in the pH8-PAC
(Vrev) of wild-type PAC and PAC(K319E) from recordings in h (n = 16 (wild type) structure are less accessible to solvent (Extended Data Fig. 8h, k). Our
and n = 7 (K319E)). Data are mean ± s.e.m. Individual data points are shown as
data suggest that the lateral fenestrations could be an extracellular
dots. j, The relative Cl−/Na+ permeability (PCl/PNa) for wild-type PAC (n = 15) and
ion-entry point that is common to two-transmembrane-helix chan-
PAC(K319E) (n = 11) calculated from current induced at pH 5 and 100 mV. Data
nels9,13,15. This agrees with a previous report in which treatment with
are mean ± s.e.m. of the permeability ratio. Individual data points are shown as
solid dots. The average PCl/PNa permeability values are indicated at the top for
a thiol-reactive reagent, MTSES, partially inhibited the ion-channel
each construct. activity of PAC when Thr306—which is part of the fenestration—was
replaced by a cysteine residue5.
In the ECD, the pore along the symmetry axis has a large vestibule in
the middle (Fig. 3b, d). This vestibule is constricted at the ECD–TMD Mechanisms of pH sensing and channel activation
interface by an ECD–TMD seal in both the pH8-PAC and the pH4-PAC To elucidate the pH-sensing mechanism of PAC, we compared the
structures (Fig. 3e–g, Extended Data Fig. 8f, i). This leads to the question structures of pH8-PAC and pH4-PAC. We focused on the ECD and the
of how ions might enter the ion-conducting pore from the extracellular ECD–TMD interface, because PAC is activated by extracellular acid.
side. Just below the seal, we observed three lateral fenestrations that Superimposing a single subunit revealed that, upon protonation, the
connect to the central pore. Fenestrations at similar locations have been major motion of the extracellular region occurred at the ECD–TMD
defined as an ion-entry point in both the ASIC and P2X channels9,13,15. At interface, whereas the ECD showed minor rigid-body movement
pH 8, the fenestration in PAC is formed by the extracellular portion of (Fig. 4a). This suggests that the ECD–TMD interface probably par-
the TM1 helix and the β14–TM2 linker of the adjacent subunit (Extended ticipates in pH sensing. We hypothesized that the His98 residue in the
Data Fig. 8f). The entrance is surrounded by several negatively charged TM1–β1 linker is one of the key pH sensors, because it showed a large
residues, making it unfavourable for conducting anions (Extended Data movement from the high-pH resting state to the low-pH proton-bound
Fig. 8g). At pH 4, a different fenestration is established by the β1 strand state and because its side-chain pKa value is close to the pH50 (pH of
and the pre-TM2 helix in the adjacent subunit (Extended Data Fig. 8i). half-maximal activation) value of PAC16 (Extended Data Fig. 9a).
The fenestration at pH 4 is wider than that at pH 8, and has several posi- At pH 8, His98 is in close contact with the Gln296, Iso298 and Ser102
tively charged residues lining the entry point, rendering it favourable residues of the adjacent ECD. We speculated that the side chain of His98
for anions (Extended Data Fig. 8i, j). To provide evidence that these fen- forms a hydrogen bond with the side-chain amine group of Gln296,
estrations could be extracellular ion-entry points in PAC, we performed which locks the TM1 helix in a conformation parallel to its cognate
molecular dynamics simulations and found that the fenestrations in TM2 helix (Fig. 4a, b). To investigate whether the interaction between
Extended Data Fig. 1 | Purification of PAC and biochemical and biophysical cation-π interaction with Arg237′ and hydrophobic interactions with Tyr267′
analysis. a, Fluorescence size-exclusion chromatography (FSEC) of PAC–GFP and Phe282′ from the adjacent subunit. The two subunits are in green and blue.
solubilized in GDN detergent. b, SDS–PAGE gel of purified PAC–GFP protein f, FSEC traces of GFP-tagged wild-type PAC and the F196 mutant solubilized
after metal affinity chromatography. The uncropped source gel of the image using GDN detergent. The peak position of F196A is shifted and is broader
can be found in Supplementary Fig. 1a. The gel was repeated three times from compared to the wild type, suggesting that F196A interferes with the proper
different batches of purification and similar results were obtained. c, SEC assembly of PAC. g, The whole-cell current density of wild-type PAC and
profile of PAC in MSP3D1 nanodiscs. d, A deglycosylation assay of PAC–GFP PAC(F196A) recorded at pH 4.6 with a holding potential of 100 mV. The centre
with or without PNGase F treatment. The GFP and far-red signal (Alexa 488 and error bar represents mean and s.e.m. Two-tailed unpaired t-test was used to
Alexa 680) of the gel was detected and merged using ChemiDoc imaging determine the difference in current density between F196A and the wild type
system (BioRad). The uncropped source gel of the image can be found in (P = 3.09 ×10 −6). D’Agostino & Pearson omnibus test was performed to check the
Supplementary Fig. 1b. The deglycosylation assay was repeated twice with normality of the data (P values are 0.846 and 0.349 for wild type (n = 10) and
similar results. e, F196 mediates intersubunit interactions by forming a F196A (n = 11), respectively). ***denotes P < 0.001.
Extended Data Fig. 2 | Workflow for cryo-EM data-processing of pH8-PAC
and data statistics. a, A total of 16,733 raw movies stacks were collected and
processed with motion correction, CTF estimation and particle picking.
Particles were subjected to two rounds of 2D classification and a 3D
classification run to obtain a homogeneous particle set. To further sort out
conformational heterogeneity, we attempted to subtract and classify (1)
particles without nanodiscs and (2) the ECD of PAC (residues 72–317) by using a
mask. Subsequent refinement allowed us to obtain a map at 3.60-Å resolution
for the entire PAC protein and 3.36-Å resolution for the ECD. b, Representative
micrograph, 2D class averages, Fourier shell correlation (FSC) curves and
angular distribution of particles used for 3D reconstruction for the pH8-PAC
dataset. The gold-standard 0.143 threshold was used to determine map
resolution based on the FSC curve. The threshold for model versus map
correlation was 0.5 to determine the resolution.
Article
Extended Data Fig. 4 | Local-resolution cryo-EM maps, representative shown. The unit for the colour key is Å. d, Representative densities of several
densities of cryo-EM maps and domain organization of human PAC. a, The secondary structural elements of pH4-PAC. The atomic model is overlaid with
local resolution of the pH8-PAC map. A non-sliced (left) and a sliced (right) view the density to show the side chain information. e, The pH8-PAC single subunit
of the map viewed parallel to the membrane are shown. The unit for the colour viewed parallel to the membrane. The wrist, palm, thumb, finger and β-ball
key is Å. b, Representative densities of several secondary structural elements domains are highlighted. f, The pH4-PAC single subunit viewed in the same
of pH8-PAC. The atomic model is overlaid with the density to show the side orientation as the right image of panel e. g, Domain organization of PAC.
chain information. c, The local resolution of the pH4-PAC map. A non-sliced Clusters of secondary structure that form the palm, finger, thumb and β-ball
(left) and a sliced (right) view of the map viewed parallel to the membrane are domains are labelled.
Extended Data Fig. 5 | Comparison of the structures of PAC and ASIC. chicken ASIC1a (green) subunit. The ECD of ASICa is composed of a β-sheet
a–d, Structural comparison of human PAC (a, c) with chicken ASIC1a (b, d) core and the exterior helical structure. Although the β-sheet core shares high
viewed parallel to the membrane (a, b) and from the extracellular side (c, d). similarity with the human PAC structure, the chicken ASIC1a TMD is organized
The acidic pocket of human PAC and chicken ASIC1a are in different locations. differently from that of the human PAC.
e, Overlay of the pH8-PAC (blue) and pH4-PAC (red) single subunit with the
Article
Extended Data Fig. 6 | Sequence alignment of PAC homologues and ASIC. extracellular domain of PAC are marked with yellow dots. Putative N-linked
Sequence alignment of PAC homologues (from human, frog (XENLA) and glycosylation sites of PAC are highlighted with green dots. Lys319 of PAC is
zebrafish (DANRE)) and chicken ASIC1. The ASIC1 sequence is aligned with PAC marked with red dots. The pre-TM2 helix observed in the pH4-PAC structure is
based on the structural alignment using TMalign45. Secondary structural (SS) indicated with a red frame. PAC lacks the α1, α2, α3, α4 and α5 helices that form
elements of PAC are labelled at the top, whereas the SS elements of ASIC1 are the ECD exterior helical structure in chicken ASIC1a, whereas the αA and αB
indicated at the bottom. Cysteine residues mediating disulfide bonds in the helices are unique to PAC.
Extended Data Fig. 7 | PAC channel desensitization. a, A representative and 0.077 for pH 4.6 and pH 4.0, respectively). D’Agostino & Pearson omnibus
whole-cell current trace of PAC in wild-type HEK293 cells upon extracellular test was performed to check the normality of the data (P values are 0.673 and
acidification at pH 4.6 and pH 4.0 with a holding potential at 100 mV. 0.335 for pH 4.6 and pH 4.0 conditions, respectively). NS indicates P > 0.05.
Substantial desensitization was observed during the prolonged exposure to e, Whole-cell patch-clamp recording configuration with 50 mM NaCl pipette
the pH 4.0 solution (position 4 versus position 3), but not to the pH 4.6 solution solution and 150 mM bath solutions (scheme depicted on the left). This creates
(position 2 versus position 1). b, Quantification of PAC desensitization (pH 4.6 the concentration gradient necessary to observe any potential PAC current at 0
(n = 12) and pH 4.0 (n = 11) as shown in a. Activation and desensitization currents mV. Owing to the small amplitude of endogenous PAC current at 0 mV, we
are normalized to the initial PAC currents. The x axis numbers correspond to transfected PAC cDNA in PAC knockout HEK293 cells. The representative
the red marker location in a. Each data point is represented by a solid dot. The whole-cell current trace of PAC upon acidification at 0 mV is shown on the right.
mean and s.e.m. are represented by the bar graph. c, Representative whole-cell Location 1 and 3 represent initial activation of PAC immediately after acidic
current-voltage traces of PAC at the beginning (position 3 in a) and the end buffer treatment. Location 2 and 4 represent desensitized PAC after prolonged
(position 4 in a) of pH 4.0 treatment. d, Reversal potential of PAC at the acidic buffer treatment. f, The desensitized currents (position 2 and 4 in e) are
beginning and the end of pH 4.6 and pH 4.0 treatment, respectively (n = 9). normalized to the initial PAC currents (position 1 and 3 in e). The desensitized
Two-tailed paired t-test was used to determine significance (P values are 0.361 data currents are represented by the normalized average ± s.e.m.
Article
Extended Data Fig. 8 | Lateral fenestration and ion selectivity of PAC. a, The wild-type PAC, PAC(K325E) and PAC(K329E). The currents are normalized to
reversal potential (Vrev) of wild-type PAC, PAC(K325E) and PAC(K329E) at those at pH 4.6 (n = 8 (wild-type PAC), n = 6 PAC(K325E) and n = 7 (PAC(K329E)).
150 mM NaCl (black) or 15 mM NaCl (red) in the bath solution (internal solution The currents at different pH are represented by the average normalized
contains 150 mM NaCl). The bar graph represents the mean and s.e.m. (n = 16 currents ± s.e.m. A nonlinear fitting to a sigmoidal dose–response curve is
(wild type), n = 8 (K325E) and n = 6 (K329E)). Individual data points are shown as generated for each construct. e, Representative whole-cell patch-clamp
dots. The same data points for the wild type were also used in Fig. 3i for recording at pH 5.0 with 150 mM NaCl pipette solution and 150 mM (black) or
comparison with K319E. b, The relative Cl−/Na+ permeability for wild-type PAC 15 mM NaCl (red) bath solutions. The current–voltage relationship of wild-type
(n = 16), and K325E (n = 8) and K329E (n = 6) mutants calculated from the pH- (left), K325E (middle) and K329E (right) PAC in two different bath solutions are
5-induced current at 100 mV. The centre and error bar represent the mean and plotted. The same wild-type traces were also shown in Fig. 3j (left) for
s.e.m of the permeability ratio. Individual data points are shown as solid dots. comparison with K319E. f, i, The pH8-PAC and pH4-PAC extracellular
The same data points for the wild type were also used in Fig. 3j for comparison fenestration viewed from the extracellular side (left) and parallel to the
with K319E. The average PCl/PNa permeability values are indicated for each membrane (right), respectively. Residues forming the fenestration are shown
construct. c, The current density of wild-type PAC (n = 10), and K325E (n = 10) in sticks, including three negatively charged residues (Asp91, Glu94 and
and K329E mutants (n = 10) at pH 4.6 with a holding potential of 100 mV. The bar Glu250) for pH8-PAC and two positively charged residues (Arg93 and Lys294)
graph shows the average normalized current density ± s.e.m. One-way ANOVA for pH4-PAC. g, j, Radius of the fenestration tunnel, estimated by CAVER v.3.0,
with Bonferroni post-hoc test was used to determine the significance (P values for pH8-PAC (g) and pH4-PAC ( j). The horizontal line marks the smallest radius
are 0.832 and 0.416 for K325E and K329E, respectively). D’Agostino & Pearson along the tunnel. The residues lining the fenestration tunnel are marked.
omnibus test was performed to check the normality of the data (P values are h, k, Fenestration water-density plot for pH8-PAC (h) and pH4-PAC (k) from a
0.255, 0.153 and 0.293 for the wild type and K325E and K329E mutants, 100-ns MD simulation. Water molecules in the Z range of the side fenestration
respectively). NS indicates P > 0.05. d, The pH dose–response curve of site are projected to the X/Y plane and are shown as a 2D histogram.
Extended Data Fig. 9 | See next page for caption.
Article
Extended Data Fig. 9 | His98 is involved in PAC pH sensing. a, pKa prediction 0.727 for the wild type and the H98C/Q296C and H98S/Q296S mutants,
of titratable residues for the pH8 and pH4 structures of human PAC. The mean respectively). e, The pH dose–response curve of wild-type PAC and PAC(H98S/
and error bar (standard deviation) are calculated based on 1,000 fixed- Q296S). The currents are normalized to those at pH 4.6 (n = 5 (wild-type PAC);
backbone rotamer ensembles generated from each structure (see Methods). b, n = 6 (PAC(H98S/Q296S)). A nonlinear fitting to a sigmoidal dose–response
SDS gel of GFP-tagged wild-type PAC, PAC(H98C/Q296C) and PAC(H98S/ curve is generated for each construct. Bar plot shows the mean ± s.e.m. f, The
Q296S). A dimeric band is observed for the H98C/Q296C mutant, but not for pH50 of wild-type PAC and PAC(H98S/Q296S) estimated from the pH dose–
the wild type and the H98S/Q296S mutant. The unedited source gel of the response curve. The centre and bar represent the estimated pH50 and s.e.m.
image can be found in Supplementary Fig. 1c. The gel was independently from the nonlinear fitting in e. Two-tailed Mann–Whitney test was used to
repeated twice with similar results. c, The FSEC profile of GFP-tagged wild-type determine the significance (P = 0.0087). g, The proposed pH-sensing
PAC, PAC(H98C/Q296C) and PAC(H98S/Q296S) solubilized using GDN mechanism for PAC. At high pH, the deprotonated His98 residue is surrounded
detergent. d, The whole-cell current density of wild-type PAC, PAC(H98C/ by Gln296, Ser102 and Iso298, and TM1 pairs with TM2 from the same subunit.
Q296C) and PAC(H98S/Q296S) recorded at pH 5.0 at 100 mV. The bar graph At low pH, the protonated His98 residue undergoes a conformational change
shows the average current density (nA/pF) ± s.e.m. Each individual data point and moves into an acidic pocket. As a result, TM1 dissociates from the resting
represents a cell (n = 8 (wild type), n = 10 (H98C/Q296C) and n = 12 (H98S/ interface and rotates to interact with TM2 of the adjacent subunit. For all
Q296S)). Two-tailed unpaired t-test was used to determine the difference in panels, NS indicates P > 0.05, ** denotes a P value between 0.01 and 0.001 and
current density compared to the wild type (P values are 1.08 × 10 −6 for H98C/ *** denotes P < 0.001; n represents measurements from biologically
Q296C and 0.321 for H98S/Q296S). D’Agostino & Pearson omnibus test was independent cells.
performed to check the normality of the data (P values are 0.328, 0.154 and
Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics
nature research | reporting summary
Corresponding author(s): Wei Lü
Last updated by author(s): Aug 5, 2020
Reporting Summary
Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist.
Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.
n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement
A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly
The statistical test(s) used AND whether they are one- or two-sided
Only common tests should be described solely by name; describe more complex techniques in the Methods section.
For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted
Give P values as exact values whenever suitable.
For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings
For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes
Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated
Our web collection on statistics for biologists contains articles on many of the points above.
Data analysis Gctf-1.06, ctffind-4.1.10, Gautomatch-0.56, Relion-3.0, CryoSparc-v0.6.5, coot-0.8.9.2, pymol-2.3.2, Motioncor2-1.2.1,
phenix.real_space_refine_dev_3500, phenix.molprobity_dev_3500, UCSF chimera_1.13.1, UCSF chimeraX_0.91, GraphPad Prism 6 and 7,
ClampFit_10.6, GROMACS version 2019.2, OPM server (https://opm.phar.umich.edu/ppm_server), CHARMM-GUI (http://www.charmm-
gui.org/), Rosetta 2020.08.61146, propka3.1, TMalign v20190822
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers.
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.
Data
Policy information about availability of data
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable:
- Accession codes, unique identifiers, or web links for publicly available datasets
- A list of figures that have associated raw data
October 2018
The cryo-EM density map and coordinates of pH8-hsPAC and pH4-hsPAC have been deposited in the Electron Microscopy Data Bank (EMDB) under accession
numbers EMD-22403 and EMD-22404 and in the Research Collaboratory for Structural Bioinformatics Protein Data Bank under accession codes 7JNA and 7JNC.
1
nature research | reporting summary
Field-specific reporting
Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection.
Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences
For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf
Replication We have done each group of experiment with several batches of cells and different transfection, to ensure reproducibility within the lab. The
number biologically independent experimental replications is indicated in the figure legend. For electrophysiology recordings, the number is
at least 5. For SDS PAGE gel experiments, they were repeated at least twice.
Randomization Our experiment is not randomized. For electrophysiology experiments, cells with GFP fluorescence (proteins were GFP-tagged) were
randomly selected. Other experiments including protein expression, solubilization test, protein purification, deglycosylation assay, and cryo-
EM grids preparation and data collection were repeated multiple time; each time, proteins from different random batches were used.
Blinding The investigators were not blinded; it was not technically or practically feasible to do so for cryo-EM or patch-clamp studies.
Authentication The cells were purchased and routinely maintained in our lab. They were not authenticated experimentally for these studies.
Mycoplasma contamination Sf9 cells, tsA201 cells and HEK293 cells were tested negative form Mycoplasma contamination
2
Matters arising
Stable agricultural systems are fundamental for the reliability of agri- Tilman1 using production stability as the response variable (see Sup-
cultural production and food security. Recently, Renard and Tilman1 plementary Methods for details). We then ran two additional regression
reported that crop diversity, calculated as the exponential value of models, one in which we replaced crop diversity with crop asynchrony
the Shannon diversity index of harvested areas of 176 crops, stabilizes and one in which we added crop asynchrony.
national food production. Here we show that crop asynchrony—that is, Crop diversity and crop asynchrony were found to be correlated
asynchronous production trends between different crops2—is an even (Spearman’s ρ = 0.49, P < 0.05; Fig. 1a). However, the positive effect of
better predictor of agricultural production stability than is crop diver- crop diversity on asynchrony decreased over time (Fig. 1a), as indicated
sity. Our finding suggests that asynchrony is one important property by a better performance of a linear mixed-effects model including time
that can explain why a higher crop diversity supports the stability of interval (Akaike information criterion (AIC) = −340.22) compared to a
national food production, and that it should be considered in strate- linear model including crop diversity only (AIC = −336.88). The positive
gies to stabilize agricultural production through crop diversification. effect of crop asynchrony on caloric production stability was more
We suggest that, as well as yield stability and crop diversity, two addi- than three times the effect of crop diversity (Fig. 1b, Extended Data
tional aspects should be considered in the discussion of the diversity– Table 2). Other predictors showed similar trends, although the effect
stability nexus. First, as well as yield stability, the stability of overall of nitrogen use intensity, time and temperature instability was stronger
production is another relevant aspect of food security. Second, the in the diversity model, whereas the effect of irrigation was lower and
actual benefits of crop diversity are not related to harvested areas as insignificant. Moreover, the explanatory power of the model increased
such, but to the temporal production patterns of the cultivated crops2. from R2 = 0.28 in the crop-diversity model to R2 = 0.60 in the asynchrony
We suggest that planting multiple crops stabilizes agricultural pro- model (Extended Data Table 2). In the model that includes both predic-
duction only if they experience asynchronous production trends—for tors, the stabilizing effect of crop asynchrony was even stronger and
example, due to distinct responses of the individual crops to climatic, the effect of crop diversity was negative (Fig. 1b, Extended Data Fig. 1);
economic and political shocks3. Here we use statistical models to test however, explanatory power increased by only 0.01 (Extended Data
whether crop asynchrony is a better predictor of agricultural produc- Table 2). Although crop diversity and asynchrony were correlated,
tion stability than is crop diversity. multicollinearity was not an issue in the combined model (the variance
We largely used the same datasets as Renard and Tilman1 (Extended inflation factors were less than 2). Given that crop asynchrony was a
Data Table 1) and derived the same explanatory variables used in their strong predictor of caloric production stability, we further explored
analysis, including effective crop species diversity4, irrigation4, nitrogen their relationship in the most recent time interval (2001–2010). The
use intensity4, warfare5, temperature and precipitation instability6–8 for highest national crop asynchronies were mainly observed in South
five ten-year intervals between 1961 and 2010 (see Supplementary Meth- and Southeast Asia, China, Central America and parts of Africa (Fig. 2).
ods for details) to predict the stability of total caloric production4,9,10. Countries within these regions typically showed high production stabil-
We additionally calculated synchrony between crop-specific caloric ity, and all countries with high asynchrony achieved at least medium
production2,11,12, an index bounded between 0 and 1, where 1 indicates stability. Countries with high production stability and low-to-medium
full synchrony. Asynchrony was then calculated by subtracting syn- asynchrony were mainly found in North and South America (Fig. 2). The
chrony from 1, so that higher values indicate higher asynchrony. We 29 countries that had low asynchrony and stability—including Russia,
used total production instead of yield stability as the response variable, Argentina and Australia—contributed more than 11% of the total crop
because this offers additional insights into food security and because caloric production.
it can be directly related to asynchrony (see Supplementary Methods Our analysis provides an important extension to the results presented
for details). Moreover, total production incorporates the effects of by Renard and Tilman1. We found that the relationship between crop
changes in cropland area as a result of planning decisions by farmers diversity and crop asynchrony decreased over time, which is a potential
and of changes in global market dynamics. consequence of the increasing homogeneity of global food supplies13.
First, we investigated the relationship between effective crop species Most importantly, we identified asynchrony as one important crop
diversity and crop asynchrony and tested if this relationship changed property (or trait) that can explain why a higher crop diversity sup-
over time, as crop homogenization has occurred during recent dec- ports the stability of national food production. Crop diversity as such
ades13. To predict crop asynchrony, we used a linear mixed-effects provides only limited insights into the mechanism that underlies stabil-
model with random slopes for diversity and random intercepts for ity. The benefits of crop diversity depend on the production patterns
time intervals14. Second, we investigated how either crop diversity, of the cultivated crops. Therefore, strategies to stabilize agricultural
crop asynchrony or both affect caloric production stability. For this, production through crop diversification also need to account for the
we constructed the main linear regression model used in Renard and asynchrony of the crops considered.
UFZ - Helmholtz Centre for Environmental Research, Leipzig, Germany. 2University of Potsdam, Institute of Biochemistry and Biology, Potsdam, Germany. 3University of Münster, Institute of
1
Landscape Ecology, Münster, Germany. 4Centre for Biodiversity Monitoring, Zoological Research Museum Alexander Koenig, Bonn, Germany. 5University of Göttingen, Agroecology,
Department of Crop Sciences, Göttingen, Germany. 6University of Göttingen, Centre of Biodiversity and Sustainable Land Use (CBL), Göttingen, Germany. 7Institute of Geoscience and
Geography, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany. ✉e-mail: lukas.egli@ufz.de
5 10 15 20 25
e
ty
re
y
y
n)
y
ity
m
on
ilit
ilit
si
fa
io
Diversity
Ti
ns
r
ab
hr
ab
ar
at
ve
te
c
W
ig
st
st
Di
yn
in
Irr
in
in
As
√(
n
re
us
tio
tu
N
ita
ra
√(
pe
ip
ec
m
Pr
Te
Fig. 1 | Crop asynchrony as a function of crop diversity and determinants of asynchrony (blue) and both (orange) (n = 590). Caloric production stability was
national caloric production stability. a, Crop asynchrony as a function of log-transformed, irrigation and nitrogen use intensity were square-root-transformed.
crop diversity using a linear mixed-effects model with random slopes for Each predictor variable was standardized to 0 mean and 1 s.d. across all nations
diversity, and random intercepts for time intervals. Dots show national data and time intervals. Data are mean ± s.e.m. *P < 0.05; **P < 0.01; ***P < 0.001; NS,
coloured by time interval (n = 590). b, Regression coefficients for all variables not significant. This figure was created with the statistical software package
in the linear regression models, including crop diversity (green), crop R 3.6.110.
The results from the crop-diversity model are largely similar to the temporal variance2,15 at farm level, at which management decisions
findings of Renard and Tilman1, because the different response vari- are made. Moreover, growing crops in different seasons is additionally
ables (caloric yield versus production stability) were highly correlated expected to increase asynchrony, which should be further investigated.
(Spearman’s ρ = 0.84, P < 0.05). However, the effect of irrigation was Likewise, we need to better understand the conditions under which
less stabilizing for production compared to yield stability, and the asynchrony is needed for and beneficial to stability. Spain, for exam-
opposite was true for nitrogen use intensity. Moreover, overall pro- ple, experienced medium asynchrony but low stability in 2001–2010,
duction stability significantly decreased over time, which has serious whereas the opposite was true for Germany. In countries that have low
implications for food security. crop asynchrony and stability, planting additional crops with different
Asynchrony emerges from the distinct responses of individual crops responses to climatic and market disturbances might be a viable option
to climatic, economic and political shocks3. Although there is increasing to increase stability and therefore food security15, in particular in light
knowledge about the underlying drivers of overall production losses3, of climate change and increasing perturbations in global markets.
little is known about the effects on individual crops in various environ- On the national level, this is especially relevant for countries that are
mental and socioeconomic contexts—in particular regarding their facing severe food insecurity, such as Malawi. For countries such as
High
Stability
Low
Low High
Asynchrony
Fig. 2 | National crop asynchrony and caloric production stability excluded from the analysis are shown in white. The figure was created with the
worldwide. Crop asynchrony and caloric production stability are shown for statistical software package R 3.6.110.
the 2001–2010 interval and are grouped by tertiles (n = 136). Countries
public repository: https://github.com/legli/AgriculturalStability. Author contributions L.E., M.S., T.T. and R.S. designed the study. L.E. and C.S. performed the
analysis. All authors wrote the manuscript.
1. Renard, D. & Tilman, D. National food production stabilized by crop diversity. Nature 571, Competing interests The authors declare no competing interests.
257–260 (2019).
2. Mehrabi, Z. & Ramankutty, N. Synchronized failure of global crop production. Nat. Ecol. Additional information
Evol. 3, 780–786 (2019). Supplementary information is available for this paper at https://doi.org/10.1038/s41586-020-
3. Cottrell, R. S. et al. Food production shocks across land and sea. Nat. Sustain. 2, 130–137 2965-6.
(2019). Correspondence and requests for materials should be addressed to L.E.
4. The Food and Agriculture Organization of the United Nations Statistics (FAO, accessed Reprints and permissions information is available at http://www.nature.com/reprints.
22 November 2019); https://www.fao.org/faostat/en/#data/. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
5. Marshall, M. G. Codebook: Major Episodes of Political Violence (MEPV) and Conflict published maps and institutional affiliations.
Regions, 1946–2015. http://www.systemicpeace.org/inscr/MEPVcodebook2016.pdf
(2016). © The Author(s), under exclusive licence to Springer Nature Limited 2020
Extended Data Fig. 1 | Main determinants of national caloric production back-transformed from square-root-transformation, predicted values were
stability. a–h, Effects of crop diversity (a), crop asynchrony (b), irrigation (c), back-transformed from log-transformation. Predictions were calculated using
nitrogen use intensity (d), temperature instability (e), precipitation instability (f), the observed range of the focal predictor, while keeping all the other predictors
warfare (g) and time (h) on caloric production stability. Results are shown for at their mean values. Shaded areas represent 95% confidence intervals. The
the linear regression models including crop diversity (green), crop asynchrony figure was created with the statistical software package R 3.6.110.
(blue) and both (orange) (n = 590). Irrigation and nitrogen use intensity were
Linear regression models include crop diversity (diversity model), crop asynchrony (asynchrony model) or both (combined model) (n = 590). Caloric production stability was log-transformed,
irrigation and nitrogen use intensity were square-root-transformed. Predictor variables were standardized to 0 mean and 1 s.d. across all nations and time intervals.
In the accompanying Comment, Egli et al.1 report findings related to our We have similar concerns about the measure of asynchrony used
Article2 on the stabilization of food production by crop diversity. In our by Egli et al.1. To calculate asynchrony, they used data on the annual
Article, we reported that crop diversity stabilized the combined caloric country-level production of each crop, rather than the annual
yield of all crops in a nation2. Our analyses showed that the portfolio country-level yields that we used. Their metric of asynchrony there-
effect3 was the probable cause of this greater stability, and we found no fore confounds fluctuations in the performance of individual crops—
support for the asynchrony hypothesis4. Egli et al.1 report somewhat as measured by yields—with fluctuations in the area planted to, and
different findings. harvested for, each of these crops, just as does their stability metric.
The results in our Article2 differ from those of Egli et al.1 because Because both yields and harvested area affect the total food supply
we analysed the year-to-year temporal stability of national caloric of a country, both have important implications for the agricultural
yield for all crops combined and analysed for asynchrony among policy of a country. We suggest that viewing each of these separately
individual crops using the national yield of each crop. By contrast, might be better for providing insights into the best way to maximize
Egli et al.1 analysed the stability of total national caloric production the year-to-year reliability of the food supply of a country. The link
and measured asynchrony on the basis of the annual national caloric found by Egli et al.1 between production asynchrony and total crop
production of each individual crop. Although yield and production production stability suggests the possibility that asynchronous vari-
are related to each other, they are not identical. Yield is the crop pro- ation in area planted with and harvested for various crops might also
duction per unit of land, whereas production is the yield multiplied contribute to national food stability. This interesting possibility merits
by the cropland area. Even using their different metric, Egli et al.1 further exploration.
found—as did we—that greater crop diversity led to greater national
temporal crop stability. 1. Egli, L., Schröter, M., Scherber, C., Tscharntke, T. & Seppelt, R. Crop asynchrony stabilizes
We suggest that national yield stability is the more informative and food production. Nature https://doi.org/10.1038/s41586-020-2965-6 (2020).
insightful of these two stability metrics because it directly measures 2. Renard, D. & Tilman, D. National food production stabilized by crop diversity. Nature 571,
257–260 (2019).
the year-to-year reliability of food production from a typical hectare of 3. Doak, D. F. et al. The statistical inevitability of stability–diversity relationships in
cropland in a nation. If the total area planted and harvested had been community ecology. Am. Nat. 151, 264–276 (1998).
4. Loreau, M. & De Mazancourt, C. Species asynchrony and its drivers: neutral and nonneutral
constant from 1961 until now in each nation, the two measures of stabil-
community dynamics in fluctuating environments. Am. Nat. 172, E48–E66 (2008).
ity would be identical. However, harvested area has been increasing in 5. The Food and Agriculture Organization of the United Nations Statistics (FAO, accessed
many lower-income nations for the past 60 years (an increase of 69% January 2019); https://www.fao.org/faostat/en/#data/.
in the least-developed nations since the 1960s5), and it first increased
and then declined in many high-income nations (a decrease of 29% in Acknowledgements We thank the Bren School of Environment Science and Management of
Europe since the 1980s5). These year-to-year changes in total national the University of California Santa Barbara for support leading to the initial publication. This
work was also supported by a grant overseen by the French ‘Programme Investissement
cropland area and year-to-year changes in yields both affect the sta- d’Avenir’ as part of the ‘Make Our Planet Great Again’ programme (reference: 17-MPGA-0004)
bility metric used by Egli et al.1. Because the database of the Food and and by a National Science Foundation grant (LTER-1831944).
Agriculture Organization of the United Nations (FAOSTAT) reports
Author contributions D.R. and D.T. wrote the paper.
the area harvested for each crop, not the area planted, the potential
to determine the effects of changes in area planted on national food Competing interests The authors declare no competing interests.
supply stability is limited. Finally, because yield stability is independ-
Additional information
ent of year-to-year changes in national cropland area, we feel that yield Correspondence and requests for materials should be addressed to D.R.
stability is more informative of underlying biological mechanisms than Reprints and permissions information is available at http://www.nature.com/reprints.
is production stability. However, when more detailed data are available, Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
assessing changes and drivers of stability of yield and cropland area
would be informative. © The Author(s), under exclusive licence to Springer Nature Limited 2020
1
CEFE, CNRS, Univ. Montpellier, University Paul Valéry Montpellier 3, EPHE, IRD, Montpellier, France. 2Bren School of Environmental Science and Management, University of California Santa
Barbara, Santa Barbara, CA, USA. 3Department of Ecology, Evolution and Behavior, University of Minnesota, St Paul, MN, USA. ✉e-mail: delphinerenard@hotmail.fr
https://doi.org/10.1038/s41586-020-2952-y
In this Review, the affiliation to which authors Victor Cazalis and Ana
S. L. Rodrigues are attributed (affiliation 2) should be corrected from
‘Centre d’Ecologie Fonctionnelle et Evolutive CEFE UMR 5175, CNRS,
Univ. de Montpellier, Univ. Paul-Valéry Montpellier, EPHE, Montpellier,
France’ to ‘CEFE, Univ. Montpellier, CNRS, EPHE, IRD, Univ. Paul Valéry
Montpellier 3, Montpellier, France’. This error has been corrected
online.
https://doi.org/10.1038/s41586-020-2967-4
In this Article, the Author Information section should have stated that
the miRNA reads for 17 tissues have been deposited at the Sequence
Read Archive (SRA) under accession numbers: SRR12545166–
SRR12545182. The original Article has not been corrected.
https://doi.org/10.1038/s41586-020-2949-6
In the ‘Code availability’ section of this Article, the URL from which
the DREEM clustering algorithm can be accessed was incorrect. The
correct URL is https://codeocean.com/capsule/6175523/tree/v1. The
Article has been corrected online.
https://doi.org/10.1038/s41586-020-2874-8
https://doi.org/10.1038/s41586-020-2955-8
https://doi.org/10.1038/s41586-020-2986-1
In the legend of Fig. 2 of this Article, owing to an error during the pro-
duction process, panels a and b were inadvertently labelled ‘northeast
China (within 38–54° N, 120–135° E)’ and panels c and d were inadvert-
ently labelled ‘southwest China (within 18–30° N, 95–110° E)’, rather
than the other way around. The figure was correct. This error has been
corrected online.
Work Your
story
Send your careers story
to: naturecareerseditor
@nature.com
GETTY
The pandemic is taking a toll on everyone — but the burden is larger for disadvantaged groups.
T
he coronavirus pandemic has affected this year by 4.9%, according to a study by the chair of the US National Postdoctoral Asso-
the entire scientific world, but in Pew Research Center in Washington DC. That ciation, based in Rockville, Maryland, which
unequal ways: although some scien- decline is expected to hit low-income commu- represents more than 40,000 postdocs.
tists have been able to carry on with nities and nations especially hard. The current crisis must be a call to action,
their lives and careers, many are strug- For students and postdocs from less- says Bea Maas, an ecologist at the Univer-
gling with family obligations, financial strain privileged backgrounds — first-generation sity of Vienna. Maas was the lead author of a
and tenuous employment. students, members of minority ethnic groups June report on the precarity of early-career
The pandemic has already dimmed job or those with financial stress — the pressures researchers during the pandemic’s first wave.
prospects in academia, and the full impacts are of the pandemic are, and will continue to be, “There must be a collective effort by the entire
probably yet to be seen. Global gross domestic particularly intense. “There are a lot of con- scientific community, especially those in lead-
product — the total value of goods produced cerns about very talented individuals falling ership positions, to respond to the short- and
and services provided — is forecast to shrink out of the pipeline,” says Barbara Natalizio, long-term challenges of this crisis and to
BEARING A HEAVY
is something that you can do at home, but it’s
postdoc at the Yale School of Medicine, New not the same as producing results in the lab.
EMOTIONAL BURDEN Haven, Connecticut, and co-chair of the Yale On the positive side, I’ve been able to rethink
Black Postdoctoral Association. my routine. Sitting at a computer for nine
I’m the co-founder and co-chair of the Yale hours per day isn’t necessary for computa-
Black Postdoctoral Association at Yale Uni- tional work. Flexibility is important for mental
versity in New Haven, Connecticut. Balanc-
EMMA HERNANDEZ-SANABRIA health, and universities should support that. I
CONCERNS OVER AN
I want to be an astronomer, but now I don’t of the conversations that will make science
know whether that will work out. I’m really more inclusive.
UNCERTAIN FUTURE scared about it. I have no idea how things will
change as the pandemic worsens, but it doesn’t Daniel Gonzales is a physics postdoc at
My path to science is really unconventional. I look good. Stories like mine are a real wake-up Purdue University, West Lafayette, Indiana.
was born and raised in the Dominican Repub- call for people who aren’t used to struggle.
lic. My family came to the United States when Interviews by Carrie Arnold and Chris
I was 16, and I taught myself English when I Rosa Ferreria is a remote undergraduate Woolston
was 17. I was living on the street before I put student in physics at Arizona State University These interviews have been edited for length
myself through college, while working with in Tempe. and clarity.
I
t’s the long wait for equipment that in journals such as the Proceedings of the between 2011 and 2015, from 30 to 60.
H. Krishnamurthy remembers from his National Academy of Sciences. The people tasked with running these
master’s studies at Bangalore University Over the past 20 years, as instrument costs facilities have a rare collection of skills:
in India. “I used to stand in line to get my have risen and funding levels fallen, institutions in-depth knowledge of the hardware they
turn to use a rusted hammer to nail [down] have increasingly consolidated microscopes, oversee, managerial and financial acumen to
a frog for dissection,” he says. mass spectrometers, flow cytometers and run what is effectively a business, and scien-
Today, Krishnamurthy directs a facility other high-tech equipment in specialized core tific know-how to guide researchers through
so that other researchers in Bengaluru and facilities, where dedicated staff can cost-ef- a range of experimental systems and designs.
throughout India need never experience that fectively provide a breadth of expertise and The management aspects alone would usually
lack of access. The Central Imaging and Flow access to equipment beyond what any single fill three jobs — financial manager, project
Cytometry Lab at the National Center for laboratory could manage. Numbers are hard to manager and people manager — says Graham
Biological Sciences in Bengaluru “has helped come by, but Peter O’Toole, director of the Bio- Wright, acting director of the Research Sup-
scientists to take their research to next level”, science Technology Facility at the University of port Centre at the Agency for Science, Technol-
he says. “Before I started this facility, there York, UK, has seen meetings for UK core-facility ogy and Research in Singapore. Krishnamurthy
was no paper published in Cell from India.” managers grow from a dozen participants in was once asked to list his responsibilities, and
Since then, users have published more than 2006 to around 200 today. And in Germany, says he was shocked at how many he had. “This
half a dozen papers in Cell, as well as others the number of imaging core facilities doubled is not a 9-to-5 job, it’s a 24-hour job,” he says.
I
Where I work have just spent six months in the Great
Barrier Reef on the research vessel
systems, including an instrument to measure
ocean conductivity, temperature and
John Fulmer Falkor. Every cruise tackles cutting-edge
science and, as lead technician, I’ve
density; a multibeam scanner to map the
ocean floor; and the ROV SuBastian, which I’m
been able to witness so many firsts: the working on in the photo.
RV Falkor is the first ship to map much of this We’ve been lucky that our institution,
area at high resolution, and to put a remotely the Schmidt Ocean Institute in Palo Alto,
operated vehicle (ROV) down in the Great California, has kept the work going during the
Barrier Reef at depths of up to 2,000 metres. pandemic. Everyone undergoes a two-week
So everything we’re seeing is new. With the hotel quarantine and COVID‑19 testing before
ROV’s two arms, we gather samples of rocks, getting on board. Other researchers haven’t
corals, sediments, jellyfish — everything we been able to get to sea, so we livestream our
come across down there — for collaborators ROV dives. Collaborators can log in and tell
to analyse later. us what looks worth sampling. Anyone can
Recently, we discovered a peculiar knoll watch and ask the scientists questions, and
rising from the sea floor 2,000 metres down, receive live answers.
peaking 300 metres below the surface. Science never sleeps. I work 12-hour shifts
The odd thing is that it shouldn’t be there. every day for 6-month stretches, with no
There’s little to no volcanic activity in the days off, and I’m on call all the time. I’m just
area to create a mound, and it should have starting six months on leave, but sometimes
been eroded by water like its surroundings. I’ll tune in to the dives. It’s exciting stuff.
Analysis of the rock samples we took from
the knoll might explain things. John Fulmer, lead technician for the
I’m the main liaison between the ship’s Schmidt Ocean Institute’s RV Falkor, is
Photograph by the crew and the scientists travelling with us. currently on leave in Halifax, Canada.
Schmidt Ocean Institute. Three technicians maintain all the shipboard Interview by Amber Dance.
I
Editorial f it was easy to change the way we eat, malnutrition in all its forms Contents
Catherine Armitage, Herb Brody, — undernutrition, overnutrition and micronutrient deficiency —
Richard Hodson, Jenny Rooke S54 GLOBAL DIET
could have been eliminated long ago. Everyone would have access
Healthy people, healthy planet
Art & Design to affordable food and choose to eat the quantity and variety that
Eating habits must change if we
Mohamed Ashour, Annthea Lewis, keeps them in optimal health. are to feed 2050’s population
Denis Mallet That’s the dream. The reality is that humans have a food problem. It
is a complex and multidimensional issue, but in broad brush strokes: S57 OPINION
Production
Cooperate to prevent food-
Nick Bruni, Kay Lewis, Ian Pope, Karl huge numbers of people go hungry; large nutritional imbalances per-
system failure
Smart sist between high- and low-income nations and regions; and the food Jessica Fanzo explains why
system, from production to supply and consumption, is both failing governments need to work
Sponsorship
Yuki Fujiwara, Natasha Boyd, society and damaging the planet. together
Takeaki Ishihama The COVID-19 pandemic has highlighted the problems, and exacer-
S58 AGRICULTURE
bated them (see page S57). Political, economic and cultural obstacles Natural solutions for
Marketing
are prolific on the pathway to achieving a sustainable global diet (S54). agricultural productivity
Gavin Buffett
Even if a suitable global menu could be agreed on, changing eating How to farm intensively and
Project Manager behaviours at scale is a formidable and understudied challenge (S70). sustainably
Rebecca Jones The mainly plant-based diet that nutritional scientists recommend S60 AQUACULTURE
Creative Director for physical and, more recently, mental health (S63) is better for the Cultivating a sea change
Wojtek Urbanek environment than diets that are heavy in meat and highly processed Growing the seafood industry
foods. To reduce our reliance on farmed meat, scientists around the sustainably
Publisher
world are developing affordable protein alternatives. Researchers are S63 HEALTH
Richard Hughes
racing to transform lab-grown meat from a headline-grabbing novelty Eating for better mental health
VP, Editorial into a viable industry supplying supermarkets (S64). And according Mood could be linked to the
Stephen Pincock to projections, aquaculture is ramping up to overtake wild fish stocks microorganisms in our gut
Managing Editor as the main source of aquatic protein in diets by 2050 (S60). Farming S64 CELLULAR AGRICULTURE
David Payne methods that intensify agricultural production while rebuilding and Cell-based meat with a side of
sustaining natural systems are also becoming more widespread (S58). science
Magazine Editor Growing meat at scale is still a
Helen Pearson
Diversity is key. There is no single solution that will guarantee sus-
tainable nutrition for everyone. In the same way that the pandemic challenge
Editor-in-Chief demands an integrated, cooperative and global response, in which S68 SUSTAINABLE NUTRITION
Magdalena Skipper science plays its part, so does feeding the global population. Research round-up
We are pleased to acknowledge the financial support of The latest studies
Ajinomoto Co., Inc., Yakult Honsha Co., Ltd and NTT Corporation in
S70 BEHAVIOUR
producing this Outlook. As always, Nature retains sole responsibility Changing diets at scale
for all editorial content. Different communities and
cultures require different
Catherine Armitage approaches
Chief editor, Nature Index
E
very morsel of food from every plate, research. “We need to produce food groups them,” says Johan Rockström, an environmen-
bowl and cooking pot around the that are good for health in ways that are restor- tal scientist at Stockholm University. In 2019,
world takes a small bite from Earth’s ative to the planet, rather than extractive,” Rockström, Hawkes and other members of an
resources. The human diet places says Corinna Hawkes, director of the Centre international group of scientists proposed the
a strain on the environment, water for Food Policy at City, University of London. EAT-Lancet diet1, a global meal plan that could,
resources, biodiversity and just about every The particular foods on the plate will vary from in theory, feed 2050’s estimated population of
other measure of planetary health. With so one place to another, she says, but those meals 10 billion people (see ‘Planetary-health diet’).
much at stake, researchers have turned their need to add up to something more sustainable That plan called for drastic cuts in meat con-
attention to a pressing question: what sort of than society’s current fare. sumption and a much higher intake of fruits
diet can the planet realistically support? “When you look carefully at the big sys- and vegetables. But it proved controversial
The answer requires insights from fields tems that regulate the stability of our planet, with meat-industry proponents and econ-
such as nutrition, agriculture and climate food is a dominant player in essentially all of omists, and the quest for a planetary diet
Elizabeth Kimani-Murage addresses community members at a meeting about food insecurity in Nairobi, Kenya.
continues. When researchers and policy- meat consumption of high-income countries, about 200 g of fish and 200 g of white meat.
makers convene at the United Nations Food Hawkes says. She notes that consuming a lot Controversially, the diet allows for just 100 g or
Systems Summit in late 2021, a healthy-planet of red meat can raise the risk of cancer and so of red meat, around one and a half servings,
diet will be near the top of the agenda. heart disease. “It’s not great for our health, per week. Rockström notes that’s a significant
The goal will be a basic framework, not an and it’s not great for our planet,” she says. reduction from the roughly 700 g of red meat
item-by-item menu, says Agnes Kalibata, a “There’s a strong alignment between health consumed each week by people in places such
food-policy specialist in Kigali, Rwanda, who and sustainability.” as North America and Europe, but it’s much
will be leading the summit as the UN special more than the amount typically eaten by peo-
envoy. “Diets are influenced by cultures and “We have to rethink our ple in low-income countries.
custom,” she says. “We can come up with the Despite its dire environmental impacts, meat
principles of what a good diet will look like. We
diets based on who are still has an important place in the global diet.
need to find a balance.” the most vulnerable On a nutritional level, the proteins and minerals
among us”. from animal products could be a real boost for
Sustainability on a plate malnourished populations around the world,
Most researchers agree that the current diet Hawkes says. “For infants who otherwise eat
is not sustainable. A 2018 analysis2 estimated This convergence of nutrition and conser- rice or starchy cassava, meat is an incredibly
that food production releases the equivalent vation is a central message of the EAT-Lancet efficient way of boosting micronutrient sta-
of 13.7 gigatonnes of carbon dioxide in green- diet. The authors started by reviewing the tus,” she says. What’s more, she says, “meat has
house gases into the air each year — more than best evidence for constructing a diet that tremendous cultural significance in people’s
one-quarter of all human-caused greenhouse would optimize human health and reduce lives — it’s associated with high status”.
gases. The same report estimated that agricul- the global toll of food-related health condi-
tural irrigation accounts for about two-thirds tions, such as diabetes, heart disease, cancer Expensive gains
of all fresh water used by humans. And about and obesity. The researchers didn’t even con- After investigating the potential environ-
37% of the planet’s land area, excluding deserts sider the impacts on climate or sustainability mental impacts of the EAT-Lancet diet, the
and ice sheets, is already dedicated to food until the nutritional framework had been set, authors concluded that a nutritious diet for
production. That footprint is likely to grow Rockström says. people could also be good for the planet.
as the population increases. The EAT-Lancet commission ultimately pro- “We found that a healthy diet combined
Some foods take up many more resources posed a ‘flexitarian’ diet that spans a spectrum with sustainable agricultural practices would
than others. At the upper end, just 100 grams of food groups. It also suggested vegan and have positive impacts on biodiversity, land,
of beef protein can result in the release of vegetarian options. Plants form the foun- water, nutrients and climate,” Rockström
the equivalent of 105 kilograms of CO2. The dation of the commission’s flexitarian diet, says. The most significant improvements
same amount of protein from a well-managed which recommends the daily consumption tied to a change in diet would come from a
field of peas, by contrast, typically releases of 300 g of vegetables, 200 g of fruit, around reduction in phosphorus and nitrogen pol-
the equivalent of only about 0.2 kg of CO2. 230 g of whole grains and 125 g of plant-based lution in waterways and greenhouse-gas
These orders-of-magnitude differences protein-rich foods, such as lentils, nuts and dry emissions. The commission estimated that
mean that any vision of a more sustainable beans. The diet calls for a mere five servings of the new meal plan could cut related green-
diet has to include marked reductions in the animal protein per person per week, including house-gas emissions by about half — down to
PLANETARY-HEALTH DIET would look much like the EAT-Lancet diet, but
If every person had a daily food allocation to sustain not only their health but also that of the planet, what would it with a decidedly East African flavour.
look like? The answer to this question in a study¹ from the EAT-Lancet commission, places an emphasis on plant-based
The proposal from her institute is one of ten
foods, and recommends an amount of animal-derived protein much lower than that eaten in high-income countries
but much higher than the amount consumed in low-income countries. to make the finals of the Food System Vision
Macronutrient intake (grams per day) Prize, a global contest sponsored by the
Tubers or starchy Rockefeller Foundation in New York City. The
Vegetables Whole grains Protein sources Fruit vegetables
209 200
winners are due to be announced in Decem-
300 232 50
ber. Kimani-Murage says she wants Kenya to
move away the sort of large-scale industrial
Dairy foods Added fats farms that currently feed cities around the
250 52 world. “Food has been so commercialized or
Added sugars commodified,” she says. “It’s produced for
Legumes Nuts Poultry Fish Eggs 31
75 50 29 28 13 money and not for feeding people. We want
to continue this local production of food
even as the world urbanizes.” She notes that
Beef, lamb and pork local food production could also significantly
14
reduce the costs of production and shipping,
potentially increasing the affordability of an
IMPACT OF UNUSED FOOD EAT-Lancet-style diet.
Food loss (from post-harvest through the supply chain and up to, but not including, retail) and waste (at retail Feeding people at a local level is the key
and consumption level) from the main food groups have negative environmental impacts. They all have a focus of the 2021 UN Food Systems Summit.
blue-water (cubic metres of water wasted), carbon (tonnes of carbon dioxide equivalent omitted) and land
(hectares of land used) footprint per tonne of food lost or wasted.
The current global diet, Kalibata says, is unbal-
anced, largely because of gaps in wealth and
Cereals and pulses Fruits and vegetables Roots, tubers and oil-bearing crops Meat and animal products opportunity. In poor areas around the world,
9% people tend to fill their stomachs with starchy,
35% 44% Blue-water footprint carbohydrate-heavy food because they can’t
Cooperate to
activities involved in producing, processing, distribut-
ing, preparing and consuming food, and the people who
influence those activities — in multiple ways. It is reducing
A
of unemployment and loss of income, this has resulted in
an increase in the number of people struggling to access a
s journalist Joan Didion wrote in her 1967 essay healthy diet. Many people are already opting instead for
‘Goodbye to All That’, “It is easy to see the begin- staple grains and unhealthier, highly-processed foods that
nings of things, and harder to see the ends.” She was are cheaper and have longer shelf lives4. This pandemic,
writing about her love affair with the city of New and the need to stave off the next one, bolsters the already
York, but the same can be said of COVID-19. compelling case for ensuring the global food supply is safe,
How the pandemic began is reasonably well understood — nutritious and equitably distributed.
the virus, SARS-CoV-2, probably made its way from wild bats Governments and businesses should prioritize ensuring
CHRIS HARTLOVE
to humans through a food market. COVID-19 is the latest in a that producers are making healthy food and that consumers
long line of diseases that have crossed from animals to peo- have access to it. They should support and invest in food-as-
ple, including HIV/AIDS, severe acute respiratory syndrome sistance programmes during and after the pandemic. Gov-
and Ebola. In fact, 60% of emerging infectious diseases are “This ernments must support the United Nations’ $10-billion
zoonotic, and of the pathogens that cause these, at least 71% COVID-19 Global Humanitarian Response Plan, set up so its
originate in wildlife1. The reshaping of habitats around the
pandemic agencies can provide the most marginalized and vulnerable
world, often initiated by the need to grow more food, puts bolsters populations with basic services, such as COVID-19 testing
people in ever closer contact with wild animals and makes the already materials, medical equipment, food, water and basic health
the transmission of infections more likely. coverage, such as vaccines. As of September, the programme
How the pandemic will end, and what damage it will cause,
compelling had received less than 30% of its target.
is less clear. So far, there is no end in sight. Many people will case for The integrated One Health approach (addressing risks
be affected forever — economically, physically, socially and ensuring at the intersection of human, animal and environmental
psychologically. The World Bank estimates that up to 115 mil- health) is crucial in responding to COVID-19, recovering
the global
lion extra people will fall into extreme poverty (living on from it and preparing for the next zoonotic pandemic. To
less than US$1.90 per day) in 2020 owing to the economic food supply minimize viral reservoirs and contact between virus-carry-
shocks of the pandemic. This, in turn, will have significant is equitably ing animals and people, wildlife habitats must be protected
impacts on food security, nutrition and health. It is projected distributed.” against urbanization and deforestation. Governments need
that 130 million more people will face acute food insecurity to police the illegal sales of wildlife in food markets and
by the end of 2020, in addition to the estimated 135 million the global food trade, and to complement this with pub-
who faced it in 2019. lic-health disease-prevention programmes and messaging.
The health of those who are already undernourished could Stronger surveillance tools to track potential zoonotic and
decline further — particularly older, vulnerable and margin- food-borne illnesses across food systems are also needed.
alized people. Disruptions to health care in many low- and These recommendations to ensure food systems func-
middle-income countries owing to COVID-19 could lead tion effectively during the pandemic and long after cannot
to around 193,000 additional deaths among children per work without a united global effort. Instead of the splin-
month2. Obesity and non-communicable diseases are signif- tered responses to the COVID-19 crisis seen so far, involving
icant risk factors for hospitalization with COVID-19, and they political polarization and geopolitical competition, pol-
can result in medical complications for both young and older iticians must embrace global cooperation and inclusion.
people. Obesity and metabolic disorders are also factors in Jessica Fanzo Governments should not face inward. They should double
the disproportionate risks of hospitalization and death in is a food-system down on opportunities to re-engage and collaborate on the
low-income and ethnic minority populations in high-income researcher and interlinked challenges of climate change, malnutrition and
countries. In Chicago, Illinois, for example, nearly 70% of the nutritionist at Johns environmental collapse.
people who have died from COVID-19 were Black, although Hopkins University in
Black people make up only 30% of the population3. Baltimore, Maryland. 1. Cutler, S. J. et al. Emerg. Infect. Dis. 16, 1–7 (2020).
2. Roberton, T. et al. Lancet Global Health 8, e901–e908 (2020).
Early evidence suggests that the pandemic is trounc- e-mail: jfanzo1@jhu. 3. Yancy, C. W. J. Am. Med. Assoc. 323, 1891–1892 (2020).
ing the functionality and efficiency of food systems — the edu 4. Belén Ruiz-Roso, M. et al. Nutrients 12, 1807 (2020).
THE ‘PUSH–PULL’ FARMING SYSTEM: CLIMATE-SMART, SUSTAINABLE AGRICULTURE FOR AFRICA/ICIPE/GREEN INK LTD. UK
intensification. The specifics vary depending
on the setting, but a growing number of exam-
ples from around the world highlight the pos-
sibility of a second green revolution — one that
might better live up to its name.
agricultural productivity
efficiency, as well as more-dramatic measures
that redesign the farming landscape.
Lucas Garibaldi, an agroecologist at the
National University of Río Negro in Bariloche,
Argentina, has focused on pollinators as a
Scientists are pursuing sustainability strategies for crucial component of what he calls ecologi-
cal intensification. “Crop yield depends not
intensifying production to tackle food security and only on the count of pollinators, but also on
environmental crises. By Michael Eisenstein the biodiversity of pollinators,” says Garibaldi.
O
“Millions of honeybees alone will not replace
the function of diverse species of wild bees
n paper, the global agriculture sector “Globally, we have to increase food production and butterflies and birds.” He notes that dif-
has done an admirable job of keep- by 60%, and in some areas we have to increase ferent bees pollinate different crops, but also
ing pace with a growing population. by 100%,” says P. V. Vara Prasad, a crop ecophys- allow more efficient pollination for some plant
According to the United Nations’ Food iologist at Kansas State University, Manhattan. species. To create a haven for these airborne
and Agriculture Organization, agricul- Over the past 50 years, producers increased assistants, Garibaldi advocates minimizing
tural output per person has increased by 50% agricultural output in much of the world pesticide use and including non-agricultural
since 1960 — impressive, considering the num- through the ‘green revolution’. But this revolu- zones in farmland. These could be wild-plant
ber of mouths to feed has more than doubled. tion has been environmentally harmful, relying borders that surround fields or just hedge-
But the reality is messier. Many people, heavily on chemical pesticides and fertilizers row-like strips of flowers that are appealing to
including those in high-income nations, lack that have inflicted lasting damage on the soil the bees that traverse them.
reliable access to nutritious food. And food and water supply. Natural biodiversity has Growing a mix of crops can have many bene-
security is an ongoing struggle for people in been sacrificed to create vast monoculture fits, including attracting pollinators. Conven-
poorer regions. Even transient disruptions can fields. And in many low-income nations, sur- tional monoculture leaves soil exposed for
have far-reaching consequences. One article1 vival depends on coaxing greater productivity much of the year, Garibaldi says. This creates
described the global food supply as being “on a from existing plots as more and more people opportunities for weeds to grow — necessi-
razor’s edge” — weather events or natural disas- scramble for limited resources, says Bernard tating herbicides — or leaves soil susceptible
ters in one part of the world can cause the price Vanlauwe, a soil scientist based in Nairobi at the to erosion. With multiple crops or rotation
of grain everywhere to spike by more than 50%. International Institute of Tropical Agriculture. throughout the year, more durable root
ers in Kenya, Uganda and Tanzania now use Smith concurs. “It was a political move, not a
push–pull cropping practices when growing scientific move,” she says, adding that the nat-
maize. They plant grasses around the edges of ural farming approach has “not been properly
maize plots that produce chemicals that ‘pull’ a trialled”. To assess the technique, she and her
common pest, the maize stalk borer (Busseola colleagues modelled the long-term impact
fusca), away from crops, while the maize itself of ZBNF on soil health. They found that the
attracts parasitic wasps that prey on the stalk approach could meaningfully and sustaina-
borer. The farmers also intersperse legumes bly improve nitrogen levels for low-yield lands,
of the genus Desmodium with the maize that but that it would offer little benefit to farms
enrich the soil with nitrogen, and produce already achieving high yields6. They concluded
compounds that ‘push’ away pests and kill off Crops rely on pollinators such as bees. that a more targeted implementation of ZBNF
a genus of invasive weed known as Striga. is needed to protect overall national food secu-
Sustainable soil management is a thorny forest management, pest management and rity. Smith remains largely positive about ZBNF,
issue, particularly in resource-limited settings. water,” says Pretty. By partnering with these which has been gaining momentum among
Vanlauwe notes that nutrient depletion is one groups, researchers can design programmes farmers. “There’s a lot of good things about it,
of the greatest threats to yield for African farm- that are more likely to be compatible with but it needs more science,” she says.
ers, making a hard-line approach to sustainabil- social, cultural and environmental conditions, Outside national initiatives, smallholder sus-
ity unrealistic. “People who say you can trigger and establish local networks of collaborators tainable intensive farming requires targeted
agricultural development in Africa without to facilitate the dissemination of information. investment and efforts to support social and
fertilizer do not have on the ground experi- Some governments are also taking a more economic stability. Vanlauwe contends that, in
ence,” he says. But there are environmentally active role. Ethiopia, for example, has focused many parts of sub-Saharan Africa, environmen-
friendly ways to feed the soil. Jo Smith, a soil on aspects of ecological repair by establishing tal and political conditions mean that many
scientist at the University of Aberdeen, UK, has ‘exclosure’ areas for depleted soils. “Areas are farmers will continue to struggle at the margins
been equipping farmers in Africa and Asia with fenced off, and after about ten years the land for the foreseeable future. Still, he sees a path
anaerobic digesters — simple systems that use starts to recover,” Smith says. towards economic mobility. “Give them access
microbes to convert animal manure into biogas In China, Fusuo Zhang, a plant-nutrition to credit they pay back over time, and invest in
for fuel and leave a nutrient-rich bioslurry. “It’s specialist at the China Agricultural University integration and value-chains so they can get rid
like giving them a little fertilizer factory — it in Beijing, and his colleagues are working with of or sell excess produce,” he says. “It’s about
gives you available ammonium that the crop government officials to mobilize an effort to creating incentives and access systems.”
can take up quickly,” she says. The biogas is also help smallholder farmers across the nation But durable change also requires building
less harmful than conventional fuels, reducing transition to more evidence-based, sustaina- local expertise in crop and soil research, and in
household air pollution and improving quality ble cultivation. This includes selecting seed ecosystems. Many specialists in these areas are
of life, Smith adds. varieties that are suited to a given plot, using also involved with international education and
Much of the world’s farming takes place modelling techniques to guide planting based training. For example, as director of the Feed
on smallholder plots. One study3 estimated on levels of sunlight, water and nutrients, and the Future Innovation Lab for Collaborative
that one-third of the global food supply is optimizing the timing and density of seed Research on Sustainable Intensification, Prasad
produced on farms of less than two hectares. planting. “We sent faculty members and groups has helped to coordinate undergraduate- and
This fragmentation can make it challenging of students to live among the farmers in the vil- graduate-level agriculture programmes in
to introduce sustainable intensification prac- lages, and work with them to try to change their places such as Senegal, Cambodia and Bang-
tices. “Smallholder production systems are management,” says Zhengxia Dou, an agricul- ladesh. Normally, these programmes take on
absolutely risk-averse,” says Vanlauwe. “Falling tural scientist at the University of Pennsylvania a few dozen students at a time, but the shift to
from earning US$100 to $50 a month can be in Philadelphia, who collaborated with Zhang’s online training as a result of the coronavirus
the difference between being not-hungry and team. By 2015, the effort had grown to include pandemic could prove to be a long-term gain
being hungry.” nearly 21 million farmers across China, who, for capacity building. “We are now talking to
Close collaboration with individual farmers on average, achieved a more than 10% boost about 500 or even 1,000 students,” he says.
is needed, but this is difficult to achieve at scale. in yield while using around 15% less fertilizer
Fortunately, smallholders are increasingly par- and reducing their greenhouse-gas output5. Michael Eisenstein is a science journalist in
ticipating in collectives that can accelerate Many farmers in India are embracing a Philadelphia, Pennsylvania.
information sharing and reduce the risk associ- national programme known as zero-budget
1. Cassman, K. G. & Grassini, P. Nature Sustain. 3, 262–268
ated with adopting new cultivation strategies. natural farming (ZBNF). This cultivation strat-
(2020).
In August4, Pretty and his colleagues reported egy involves using soil microbes and mulch 2. Pretty, J. M. Natural Res. Forum 21, 247–256 (1997).
that, worldwide, around 8 million such groups rather than synthetic fertilizers to enrich lands. 3. Ricciardi, V. Glob. Food Security 17, 64–72 (2018).
4. Pretty, J. et al. Glob. Sustain. 3, e23 (2020).
have formed over the past two decades. “That’s Farmers in several Indian states are pursuing
5. Cui, Z. et al. Nature 555, 363–366 (2018).
about 240 million people working in collec- the approach, including around half a million 6. Smith, J., Yeluripati, J., Smith, P. & Nayak, D. R. Nature
tive-action efforts around areas like irrigation, farmers in Andhra Pradesh. But some scientists Sustain. 3, 247–252 (2020).
O
n a summer morning in 2019, Andy Inside is more dark sediment — mostly beneath the mussel raft, as part of an effort to
Suhrbier pilots a small aluminium waste from the mussels, the source of the develop aquaculture in Puget Sound. The hefty
boat out to a mussel raft in a quiet smell. Suhrbier sifts through it. He is looking size of the cucumbers is a promising sign.
cove on the eastern shore of Puget for something. Suhrbier and his colleagues think that
Sound in Washington State. As the “Look at this monster!” he says, holding up sea-cucumber farming could have two ben-
boat approaches, a mother seal and her pup a sea cucumber nearly a foot long. Its deep efits. First, the animals could help to prevent
resting on the raft slip into the water. Suhrbier red body covered in orange bumps stands out excess waste from building up underneath
climbs from his boat onto the raft; the only from the muck like a gold doubloon. “That’s aquaculture installations, such as mussel
sign of life is a vague smell. definitely market size.” rafts or net pens used to hold bony fish such
Suhrbier tugs on a couple of ropes attached Suhrbier is a biologist with the Pacific as salmon. (Sea cucumbers, soft-bodied ani-
to one of the raft’s beams. Soon, a mesh- Shellfish Institute in Olympia, Washington, a mals related to sea urchins, move slowly over
lined plastic cage emerges with water and non-profit research organization that works the sea floor eating detritus — the vacuum
silt pouring out of it. He picks off several sea to promote healthy wild shellfish populations cleaners of the ocean.) Second, a ready source
stars and tosses them back into the water, and sustainable shellfish aquaculture along of farmed sea cucumbers could reduce the
then flips open the lid like a pirate opening the US west coast. Two years earlier, he had put poaching of wild stocks to feed the growing
a treasure chest. sea cucumbers in cages and suspended them market in east and southeast Asia.
SARAH DEWEERDT
But in practice, it can be difficult to quantify
these benefits. For example, because nitro-
gen moves freely through water, it is difficult
to track uptake of excess nitrogen produced
Sea cucumbers are retrieved from the mesh-lined cages at Puget Sound in Washington state. by bony fish by seaweed growing nearby. And
then there are the complexities of managing
on average. The dearth of knowledge about crops or infrastructure in the water and on an operation with multiple species — not just
aquatic pathogens makes diseases hard to land. “From an environmental point of view I producing them but also harvesting, process-
predict and spot. think climate change is the greatest challenge” ing and marketing them.
It can also be a challenge to deduce their for the sustainability of aquaculture systems, Suhrbier knows such difficulties well. The
cause. For example, ice-ice disease results in says Nesar Ahmed, who studies global seafood sea cucumbers he and his team harvested from
bleaching of Kappaphycus seaweed, which is sustainability at Deakin University. under the mussel raft were the right size, weight
grown in large amounts in southeast Asia and Climate change also intersects with aquacul- and colour for the export market, but the mus-
Tanzania for the production of food additives, ture’s pressure on water and land resources. sel producer he was working with was unable to
such as the thickening agent carrageenan. The Inland aquaculture demands 429 cubic kilo- renew its permit at that location. The raft was
disease has caused yields to plummet over metres of fresh water each year — much less lost, and with it Suhrbier’s chance of follow-up
the past decade, but “the causative agent is than the demand from terrestrial agriculture, experiments to develop sea-cucumber aqua-
still not known”, says Valéria Montalescot, but still enough to pose a strain on increasingly culture techniques. “I was really shocked and
senior project manager for GlobalSeaweed- drought-prone areas. saddened to see that go because it was one of
STAR, a four-year research project based at In south and southeast Asia, prawn cultiva- those places where it just makes a lot of sense
the Scottish Association for Marine Science tion has contributed to the destruction of 38% for sea cucumbers to be,” Suhrbier says. The
in Oban, UK, which aims to boost knowledge of the world’s mangrove habitats, which have new location of the producer’s rafts isn’t a good
about seaweed cultivation in low- and mid- a variety of important ecological functions, habitat for sea cucumbers.
dle-income countries. Kappaphycus is usually including sequestering carbon and buffering Suhrbier is still experimenting growing sea
grown from cuttings, so the whole crop across coastlines from storms and sea-level rise. The cucumbers alongside other types of aqua-
multiple countries might be the result of just a loss of mangroves has also resulted in saltwater culture operation around the Puget Sound
few clones, possibly making it more vulnerable intrusion rendering inland areas unsuitable for area. But, like an increasing number of aqua-
to disease, Montalescot adds. terrestrial agriculture. culture researchers, he is beginning to think
Some farmers are now producing prawns that producing the animals needs to move in
Diverse yields among intact mangrove stands. Although there a simpler and more radical direction. Grow-
Climate change is complicating efforts to fight are concerns that this practice might also dam- ing sea cucumbers in cages is labour intensive.
disease. Higher water temperatures can alter age the health of the mangroves, it is part of a What if the animals are placed in the vicinity of
the microbial community of a body of water, larger trend to create aquaculture systems that aquaculture operations and left to roam freely
encouraging the growth of pathogens, as well include multiple species and involve interrela- — like a marine equivalent of a ranch or even a
as stressing organisms and making them more tionships more like the ones that keep natural permaculture system?
vulnerable to disease. One suggested cause ecosystems in balance. “If we could mainly enhance the wild popu-
of ice-ice disease is that temperature-stressed Some examples of this integrated aquacul- lation around these areas, I think that would be
seaweeds release compounds that attract ture are long-established, such as stocking rice a great benefit for everybody,” Suhrbier says.
bacteria, for example. paddy fields with fish or prawns. The animals “I’m trying to have something that fits in: easy,
And temperature is not the only issue. eat pests and fertilize the rice crop, increasing cost effective and as passive as it can be.”
Both increased rainfall and salinity intrusion rice yields and providing an extra source of pro-
from sea-level rise can alter water chemistry tein or income for small-scale farmers, Ahmed Sarah DeWeerdt is a freelance writer in
in ways that are detrimental to aquaculture says. Growing two species in a single body of Seattle covering biology, medicine and the
organisms. Storms can destroy aquaculture water also reduces overall water use. environment.
I
f you want to do right by the planet, the adequate quantities, confer health benefits on them all, people need to eat a well-rounded diet.
general advice is to eat an abundance of the host — have had compelling results, such as On the question of what mix of microbes
fruit and vegetables, as well as whole grains reduced symptoms of depression in women makes for a healthy microbiome, diversity
and nuts, and to consume less meat, dairy who have recently given birth. But others have is thought to be important. People with a
and processed foods. Add some high-fibre shown no more effect than a placebo. Compar- greater variety of bacteria in their gut seem
fare, fermented food and fish a few times a isons are difficult owing to the varying doses to be healthier. A Western diet of processed
week, and you could be eating your way to and strains of bacteria used in trials. Similarly, foods that is high in fat and sugar and low in
better mental health, too. although promising, the evidence from human dietary fibre and micronutrients seems to
Those are the recommendations from nutri- trials testing specific prebiotic foods — those be detrimental to both the gut and the mind,
tional psychiatry research. The field is built that are rich in high-starch dietary fibres that reducing the diversity of the gut microbiome,
around growing evidence from population stimulate beneficial bacterial colonies in the increasing inflammation and elevating the risk
studies and clinical trials in the past decade gut — is insufficient to draw clear conclusions. of depression.
suggesting that dietary improvements might At the Food and Mood Centre, researchers The long-running SUN (Seguimiento Univer-
not only improve mood but also treat com- focus on the entire diet, rather than individ- sity of Navarra) cohort study has been recruit-
mon and severe mental illnesses. Traditional ual ingredients or certain strains of bacteria ing university graduates in Spain since 2000
diets consumed by people in places such as the delivered in probiotic supplements. “There to analyse associations between dietary pat-
Mediterranean, Norway and Japan are associ- terns and health, including depression. One
ated with a lower risk of depression — one of of its findings is that the more ultra-processed
the most common mood disorders — and, to foods (usually, energy-dense foods that are
a lesser extent, anxiety. A change in diet can significantly altered from their original state)
alleviate symptoms of depression, even among you eat, the greater your risk of depression2.
people with severe forms of the condition. Over the past decade, observational studies
Researchers have also been investigating of this kind have consistently shown that
how the trillions of microorganisms in the well-rounded diets that are low in ultra-pro-
human gut communicate with the brain to cessed foods confer some protection against
influence the processes that take place there. depression. But public-health researcher
Imbalances in the gut microbiome have been Almudena Sánchez Villegas at the University
linked to a range of neurological disorders, of Las Palmas of Gran Canaria, Spain, says more
including Alzheimer’s disease, autism spec- randomized controlled trials testing specific
trum disorder, multiple sclerosis, Parkinson’s dietary interventions are needed. This will
SCIEPRO/SPL
W
hen Laura Domigan started her efforts on creating artificial corneas for eye in October 2020, a team led by Domigan won
research group at the University surgery — a far cry from anything resembling a multi-million dollar grant from the New
of Auckland, New Zealand, in 2015, a lab-grown steak. Zealand and Singaporean governments to
she hoped to continue her work Still, she never gave up on her dream of stud- explore questions such as which cells are the
developing protocols for growing ying in vitro meat. “I had to be super patient best starting material for cultured meat, and
cell-based meat in the laboratory. But with and keep trying,” Domigan says. And although is the nutritional profile of meat grown in a lab
funding for cultivated-meat research prac- it took several years, Domigan’s strategy even- equivalent to the real thing. “There is so much
tically non-existent in academia at the time, tually paid off. research that needs to be done,” Domigan says.
Domigan pivoted to working on biomedical Initially, she secured funding for a PhD And much of it is only beginning to happen, at
materials for use in tissue engineering. A pro- student to begin developing formulations of least in any sort of transparent way.
tein biochemist by training, she focused her nutrient media to grow cell-based meat. Then, Investors have poured hundreds of millions
Media matters
According to an analysis by the GFI4, growth
media currently make up the bulk of total
production costs for cultivated meat, and
proteins known as growth factors are the
most expensive ingredient. Costs are com-
ing down, as start-ups dedicated to serving
the cellular-agriculture industry devise ways
Research round-up
Highlights from
sustainable-
nutrition research.
By Dyani Lewis
Farming trends
deplete pollinators
Most cultivated crops depend
on insect pollinators, such as
bees, but global crop trends are
leaving pollinators worse off.
Using data from the United
Nations’ Food and Agriculture
Organization, an international
team, led by Marcelo Aizen
at the National University
of Comahue in Rio Negro,
FOTOKOSTIC/ISTOCK/GETTY
Argentina, assessed changes
in the amount of land used for
agriculture and the types of
crops cultivated between 1961
and 2016. During that time, the
area of land used to grow crops
increased by around 40%, and Rapeseed crops depend on pollinators such as bees.
pollinator-dependent cropland
more than doubled. Soya bean, problems for food security. difficult. Comprehensive data the people living in each
rapeseed and oil palm — crops Poorer regions will be the on how much food ends up household from attributes
associated with deforestation hardest hit by crop failures, but in the bin does not exist. But such as height, weight, age
and diversity loss — account higher-income countries that Yang Yu and Edward Jaenicke at and gender. The amount of
for much of the expansion and rely on imported food will also Pennsylvania State University food waste was estimated
for the increase in pollinator be affected. in University Park used a new according to the difference
dependence. Rotating a diverse range of method to overcome the lack between the household’s food
But although the land used crops on a single piece of land of data. inputs and its members’ energy
has increased, crop diversity could help to stem the decline in Instead of trying to requirements, not accounting
has remain largely the same pollinator populations. Planting measure food waste directly, for overeating.
since 2000. Producers have native flowers and hedgerows Yu and Jaenicke calculated The study showed that the
opted for large-scale cultivation on agricultural land and a household’s ability to average household wasted
of one crop. That’s a problem restoring neighbouring natural efficiently convert food close to one-third of the food
because monocultures don’t environments could also brought into the household that it bought, which means
provide pollinators with a preserve pollinator habitats. into the energy required to that the United States wastes an
stable, year-round supply of maintain the body weight of its estimated US$240 billion worth
food. This ultimately leads Glob. Change Biol. 25, 3516–3527 residents. First, they obtained of food per year. The most
to a fall in insect numbers, (2019) data on food purchases from efficient household in the study
lower yields and increased around 4,000 households wasted about 9% of its food.
deforestation as demand for that took part in the 2012 US Healthier diets created more
land surges. US household food Department of Agriculture’s waste than unhealthier diets,
Greater reliance on crops National Household Food owing to the greater proportion
that are dependent on single- waste calculated Acquisition and Purchase of fruit and vegetables. Higher-
species pollinators, coupled Working out how much food Survey. The authors then income households wasted
with declining pollinator goes uneaten in an individual calculated the metabolic about 50% more food than
populations, could cause household is notoriously energy requirements of lower-income households,
I
f everyone ate a balanced diet featuring acceptance, but scientists don’t know how to of behavioural changes are needed. By compar-
more plant-based and sustainable ani- bring about the reforms needed, on the scale ison, data on what needs to happen in poorer
mal-sourced food, up to eight billion required. and subsistence-farming communities are
tonnes of carbon dioxide emissions might Much of the world’s population, even in almost non-existent. Because the food behav-
be avoided globally each year by 2050, relatively rich nations, cannot afford the kind iours of these communities are thought to be
according to the 2019 special report on climate of sustainable plant-based diet that scientists much more sustainable than those of industri-
change and land by the Intergovernmental favour. As the IPCC special report notes, mit- alized economies, the focus for these societies
Panel on Climate Change (IPCC). igating climate change through dietary mod- is less on pushing urgent changes and more on
Modifying diet on a global scale is a major ification relies on consumers altering their managing social changes to ensure unsustain-
opportunity to combat climate change, argues choices and preferences. These, in turn, are able behaviours aren’t introduced.
the report. guided by “social, cultural, environmental and The IPCC report lists school food pro-
Naoko Ishii, an economist at the University of traditional factors, as well as income growth”, curement, health-insurance initiatives and
Tokyo’s Institute for Future Initiatives, agrees. the report says, all of which are hard to shift. public-awareness campaigns as examples of
“One of the biggest risk factors for the planet’s Studies on which levers for changing food policies that can potentially change demand.
health is our food system,” she says. “The way behaviours work best are surprisingly scant. But research to quantify the effects of vari-
we eat needs to change.” Most research concentrates on richer and ous interventions, such as taxes, labelling or
That opinion might be gaining widespread Western countries, which is where the majority changing in-store food displays, suggests that
Unintended consequences
JULIEN VIRY/ISTOCK/GETTY
Although eliminating or significantly
reducing meat consumption would help
the environment, evidence suggests this is
unlikely to happen at scale because many
meat-eaters are reluctant to change their eat-
Kangaroo meat is a sustainable alternative to beef, but consumers can be reluctant to switch. ing habits. A better strategy, some research-
ers argue, is to shift consumer preferences
won’t be enough,” says Lawrence. of people surveyed in each of the European from high carbon-producing meats, such as
Most of the evidence on changing food countries disagreed with statements such as lamb and beef, towards meats with a lower
behaviours comes from work on tackling obe- “a particular food is chosen because it makes environmental impact, such as chicken and
sity. Findings from dietary studies with a focus me look good in front of others”. In Uganda, pork.
on health are being examined for their applica- however, more participants agreed with such In a 2019 study8, marketing experts in
bility to the younger field of food sustainability. statements. Belgium reorganized a butcher’s counter,
One of the methods routinely deployed to “I don’t think we’ll be able to address food increasing the space given to poultry and
encourage healthy diets uses labels designed to behaviour on a global level in a uniform way,” decreasing the space for red meat. This led
inform consumers about the nutritional value says Suzanne Kapelari, an educational scientist to a 13% increase in chicken sales in 4 weeks.
of food. The traffic-light system in the United at the University of Innsbruck, Austria, and an The only trouble is that sales of red meat didn’t
Kingdom, for example, gives shoppers an idea author of the study. “The more we know about fall in tandem, so the net result was a greater
of how healthy a product is or isn’t at a glance. the cultural attitudes to food and behaviours, amount of meat sold, albeit not significantly.
The evidence of the effectiveness of this sort the better, but there’s quite a bit of work to be Although this was one small study, it demon-
of intervention is encouraging. done on that.” strates a broader point: there is no single
The OECD estimates that between 50% and solution to the problem of how to change con-
60% of shoppers check nutritional labels at “I don’t think we’ll be able sumers’ behaviour. “The common feature in
least some of the time. Research established all these areas is their limited effectiveness,”
that labels indicating a product’s health cre-
to address food behaviour says Sassi.
dentials — or lack thereof — are linked to an on a global level in a The hope is that applying a range of meth-
18% increase in people buying healthier food6. uniform way.” ods in a coordinated way will have a cumula-
Labelling on health grounds influences food tive effect. But that hope lacks a solid evidence
behaviour, says one of the authors of the study, base. Researchers are even unsure whether
Michele Cecchini, a health-policy analyst at the Food behaviours in higher-income coun- different groups respond to different meth-
OECD’s health division in Paris. “I don’t see why tries such as many OECD member states are ods. “The truth is that we don’t really know,”
the same wouldn’t also apply to other issues different from those in middle- and low-in- says Sassi. “It’s a gap in our evidence.”
that consumers care about, like sustainability,” come countries. Consumers in wealthy
he says. countries buy more meat, and packaged and Benjamin Plackett is a freelance science
Ishii says that only a proportion of consum- processed foods. “It’s been like this for decades writer based in London.
ers need to change their behaviour for labelling in high-income countries,” says Lawrence.
information to have an impact. “A relatively People in low-income countries, by compari- 1. Willett, W. et al. Lancet 393, 447–492 (2019).
2. Barosh, L., Friel, S., Engelhardt, K. & Chan, L. Aust. N. Z. J.
small number can influence the brand to son, often eat less meat and opt for locally pro- Public Health 38, 7–12 (2014).
change, and therefore they can influence the duced products with less packaging. 3. Rejman, K., Kaczorowska, J., Halicka, E. & Laskowski, W.
wider supply chain,” she says. The emphasis in high-income countries is, Public Health Nutr. 22, 1330–1339 (2019).
4. Teng, A. M. et al. Obes. Rev. 20, 1187–1204 (2019).
therefore, on correcting unsustainable behav- 5. Hoek, A. C., Pearson, D., James, S. W., Lawrence, M. A. &
Cultural matters iours, whereas in low- and middle-income Friel, S. Food Qual. Pref. 58, 94–106 (2017).
A 2020 survey7 of close to 1,200 people across countries, it’s on preventing unsustainable 6. Cecchini, M. & Warin, L. Obes. Rev. 17, 201–210 (2016).
7. Kapelari, S. et al. Sustainability 12, 1509 (2020).
12 European countries and Uganda, high- behaviours becoming the norm.
8. Coucke, N., Vermeir, I., Slabbinck, H. & Van Kerckhove, A.
lighted the influence that culture can have on “We have to be careful here because we Foods 8, 186 (2019).
food behaviours. For example, the majority don’t want to be sitting in ivory towers telling 9. Xue, L. et al. Environ. Sci. Technol. 51, 6618–6633 (2017).
A
lthough hunger was demographic transition (where poorest of countries, including and effect to food systems —
steadily declining for countries shift from patterns in rural areas and in people with could push up to 100 million
decades, progress of high to lower fertility and low incomes. DBM is seen in people into extreme poverty7.
stalled in 20151 and since then, mortality) and epidemiological around 14 million children in Lockdowns to deal with the
the number of people who transition (where the prevalent Asia and 9.6 million children spread of COVID-19 disrupted
suffer from hunger has slowly disease burden shifts from in Africa. global supply chains and
increased. In 2018, there were infectious to chronic and national, local and household
more than 820 million people degenerative disease). With Syndemics economies. With the increase
hungry2. Around 2 billion people urbanization, economic growth The global synergistic in poverty, and hindrance of
worldwide have micronutrient and technological change, epidemic — the ‘syndemic’ essential interventions and
deficiencies and 149 million diets shift from starchy, low — of overnutrition, food security, COVID-19 could
children are stunted3. And yet variety, low fat and high fibre undernutrition and climate reverse many of the hard-won
in 2018, 40 million children towards the ‘Western’ pattern change was described as gains in maternal and child
under five were overweight, of increased fat, sugar and the greatest challenge for nutrition of recent decades.
and in 2016, 2 billion adults processed foods. The effects human and planetary health Furthermore, overweight
were overweight, with a third of malnutrition have also been in the 21st century6 and and obesity are associated
of these obese. Many countries shown to be intergenerational, has been compounded by with increased likelihood of
are now challenged by what’s where maternal nutrition can set unprecedented challenges to hospitalization, ICU admission
known as the double burden of a trajectory for life-long health of food and nutrition security in and worse outcomes
malnutrition (DBM), in which her offspring5. 2020. In the biggest upsurge in COVID-19.
people are simultaneously In the 1990s, DBM mostly seen in decades, desert locust
overweight and malnourished. affected the highest income outbreaks in East Africa and Building solutions
The double burden countries of the group of low- South Asia have had disastrous It is now evident that human
of malnutrition is closely and middle-income countries effects on local food supplies. and planetary health cannot
associated with nutrition (LMICs), but in the last decade COVID-19 — a pandemic be disentangled, nor can
transition4 which is tracking both it has been seen in even the inextricably linked in cause nutrition be considered in
S P O N S O R FE AT U R E
Ajinomoto
Group Global
Event Organizers
Brand
Guidelines
Ministry
クリアスペース
of Agriculture,
メッセージ付AGBの独自性を保ち、常に明瞭に表示するために、その
HEX: #EE1C26
最小使用サイズ
メッセージ付AGBが明瞭に表示できる最小サイズとして、
最小使用 X
S P O N S O R FE AT U R E
EFFECTING CHANGE IN
A FRACTURED LANDSCAPE
The double burden of malnutri- change and provide food
tion (DBM) — the combination environments that incentivize
problem of both too many healthy eating. Government
calories and simultaneous mal- approaches could include
PERSPECTIVES FROM JAPAN : TOWARDS nutrition — is increasing in low- taxing highly-processed foods
NUTRITION FOR GROWTH, TOKYO 2021 and middle-income countries. and sugary drinks, improving
Inadequate nutrition is still a leading cause of global deaths. This, along with the climate transport infrastructure for
“Figures show that around 38%, or 5.3 million children, emergency and COVID-19 affordable delivery of food,
are dying before their fifth birthday. According to a 2013 pandemic, is having a signifi- providing safety nets for
Lancet paper, 45% of those deaths are due to malnutrition. cant impact on global health consumers who cannot afford a
In other words, it is estimated that if those infants had been and increasing the risk of healthy diet, investing in public
able to get adequate nutrition, they would not have died non-communicable diseases. awareness campaigns, and
from infectious diseases such as diarrhoea, measles, acute The Nutrition for Health making nutrition a focus for
respiratory infections, malaria, tuberculosis and AIDS,” said
breakout session of the Food health both in schools and for
SANTO Akiko, President of the House of Councillors, the
National Diet of Japan. Systems for Nutrition and the general public. Companies
Nutrition has an important role to play in prevention and Health workshop, comprised need to invest in innovation
control of disease, including COVID-19. of experts from industry, to improve the nutritional
“Japan achieved a dramatic reduction in the morbidity government and academia, value and safety of their food,
and mortality of communicable diseases…through nutrition discussed existing policies but not at the expense of
improvement after the Second World War. Recent studies… to tackle DBM. Examples palatability. Investment in R&D
have shown a close relationship between COVID-19 and include Ghana’s ‘Planting is expensive and needs support
malnutrition. Therefore, taking measures to improve for food and jobs’, which to develop a sustainable
nutrition is critical,” said SHOBAYASHI Tokuaki, Director has helped communities to business case that does not
General, Health Service Bureau, Ministry of Health, Labour
expand their land use, and compromise affordability.
and Welfare.
This requires a shift towards better food and healthier diets. community dams that support Change is even more
“COVID-19 [has] brought a significant demand shift fish farms and improve land important in light of the
from eating out to home,” said OSAWA Makoto, Vice- irrigation. Japan worked COVID-19 pandemic, which
Minister for International Affairs, Ministry of Agriculture, to improve undernutrition demonstrated the impact of
Forestry and Fisheries, Japan. “We take this opportunity to following the Second World overweight and nutrition on
propose healthier and more sustainable dietary habits by War, and overweight and the progress of infectious
promoting traditional local diets based on local production obesity following economic disease. The strategies to keep
for local consumption.” growth, through education and the spread of COVID-19 under
Achieving an improvement in diet and sustainable eating
improving access to healthy and control have left some people
requires a policy shift.
nutritious food, including for isolated, and many have lost
“Investment in health system strengthening is a
prerequisite for sustainable development and economic low-income populations as part their jobs and experienced
growth. Good nutrition is a foundation for healthy lives and of universal health coverage. serious economic hardship. The
sustainable health systems,” said MIMURA Atsushi, Deputy Tackling DBM needs experience of countries such
Vice Minister of Finance for International Affairs, Ministry behaviour change of all as Japan, which has survived
of Finance. food system actors and natural disasters, has shown
The World Bank’s Human Capital Project is an important consumers, backed by a deep the importance of nutritional
part of the process. understanding of what drives preparedness. While staples
“The Human Capital Project considers nutrition as people’s decisions in nutrition that can be stored long-term
an essential element in unlocking human potential and
and health. All stakeholders, are a defence against hunger,
economic growth,” added Mimura.
including governments, food support should be safe
The current global situation and the forthcoming summit
provides an opportunity to transform the way the world companies, the World Health and nutritionally balanced,
tackles the global malnutrition challenge. Organization, the UN Food with the right levels of protein,
“We expect countries to review their existing nutrition and Agriculture Organization, lipids, vitamins, minerals and
policy, consult with other nutrition stakeholders and the International Union of dietary fibre. The store of
announce ‘SMART’ commitments at the Tokyo N4G Summit Nutritional Sciences, non- staples can be buttressed by
2021,” said ONO Keiichi, Ambassador, Director-General for government organizations and local production of nutrient-
Global Issues, Ministry of Foreign Affairs of Japan. policy makers, will need to be dense food that is accessible
involved to facilitate behavioural and affordable.
S P O N S O R FE AT U R E
The panel heard that (including medical schools) is local academic institutions, and research. Countries like
the One Health Approach, required for precision plane- food companies and schools, Sri Lanka and Japan have
which looks at the interaction tary, population and personal and by making the most of nutritionists embedded within
between human and animal nutrition and health. locally available and low-cost their ministries of health. Japan
health and the environment, Data and evidence was also point-of-care technologies also has dedicated nutritionists
will be critical, particularly with discussed as a global challenge and innovations. working within the Ministry
the threat of future zoonotic related to nutrition. Researchers To really make an impact, the of Agriculture, Forestry and
diseases. Approaches aimed need data collected at a local nutrition community needs to be Fisheries, the Ministry of
at improving human health level to track DBM, but there involved in the political decision- Education, and the consumer
also need to consider animal isn’t enough available. Better making process. As well as being affairs agency. In order to beat
health and the environment, and more granular data, able to provide support and DBM and other nutritional
and this requires collaboration collected more frequently, will evidence, nutrition professionals challenges, nutritionists need
and cooperation between re- help to track diet, weight, blood should assess the trade-offs and to reach out to, and work
searchers involved in all these sugar and/or health. This can challenges that policymakers with, other sectors to find
fields. Education at all levels be improved by working with face and feed this into education common solutions.n
BREAKOUT SESSION: BUSINESS FOR NUTRITION population as a whole. It could a role in influencing dietary
improving nutritional literacy which will make setting aside not-for-profit funding groups and transparent and evidence-
to promote behaviour change. resources for improving and governments. based discussions with
Japan has nutrition training in nutrition more difficult for both Consumer behaviour and clear accountability.
schools. Israel is taking a more domestic governments and choice is shaped by advertising The first United Nations
positive approach by labelling overseas donor governments. and marketing messages. The Food Systems Summit in 2021
the healthy foods. Innovative funding sources panel agreed that changing along with the Nutrition for
The panel acknowledged that and private sector finance will behaviours and shaping Growth Summit due to be
the COVID-19 pandemic has need to step up. Investment in markets, for both industry held in Tokyo in 2021 provide
impacted national economies, research and development will and consumers, will need a good opportunities to advance
and constrained budgets, need to come from industry, multi-stakeholder approach, this agenda.n