You are on page 1of 25

Highlights 2021

ebi.ac.uk

European Bioinformatics Institute (EMBL-EBI)


Contents
Who we are 2 Training 24
Foreword 3 Strategic partnerships 28
Economic impact of open data 4 Culture and infrastructure 31
2021 in numbers 6 New leadership 36
Highlights of the year 8 Facts and figures 38
Data resources 10 Our funders 41
Research 20 EMBL-EBI leadership 42

© 2022 European Molecular Biology Laboratory

This publication was produced by the Communications team at


EMBL’s European Bioinformatics Institute (EMBL-EBI).

Structure of E. coli contaminant protein YadF co-purified with a


plant protein. The structure was solved with the help of AlphaFold.
PDBe accession: 7sev. AlphaFold DB accession: P61517.

Cover illustration: Karen Arnott

For more information about EMBL-EBI please contact: comms@ebi.ac.uk


EMBL-EBI HIGHLIGHTS 2021 FOREWORD

Who we are Foreword


EMBL’s European Bioinformatics Institute (EMBL-EBI) is the world’s leading The COVID-19 pandemic has shown The result will be fundamental research that
source of public biomolecular data. We enable life science research and its without a doubt what the collaboration expands what we know about life on earth
translation to medicine, agriculture, industry and society by providing biological between scientists, governments and and that sparks new ways of addressing the
data, tools and knowledge. industry can achieve. challenges our world is facing. Bioinformatics
and data science are a vital part of this
Innovation happened at pace – from vaccine programme, which sets the stage for
We are one of the six sites of the European Our missions development to novel epidemiological innovative multidisciplinary collaborations.
Molecular Biology Laboratory (EMBL), models and drug repurposing – highlighting
Europe’s leading life sciences organisation. • To freely provide data and bioinformatics that fundamental research can have the We would like to thank our funders,
EMBL conducts world-class life sciences services to the scientific community in most surprising and useful applications. including EMBL’s member states, UK
research, provides training and state-of- ways that promote scientific progress. The pandemic also made it clear that we are Research and Innovation, the National
the-art research infrastructures. EMBL is living in a world where bioinformatics and Institutes of Health, the European
an intergovernmental organisation with • To contribute to the advancement of data science are crucial to addressing global Commission and Wellcome, among others,
27 member states. EMBL’s latest scientific biology through investigator-driven challenges, from health and disease to food for their continued support and confidence
programme – Molecules to Ecosystems – research in bioinformatics. security, and biodiversity conservation. in our work.
seeks to better understand life in context,
from molecules to ecosystems. • To provide bioinformatics training to EMBL’s new scientific programme beginning We’re looking forward to working with
scientists at all levels. in 2022, called Molecules to Ecosystems, existing and new collaborators to continue
Our vision is an ambitious endeavour to bring biology developing our research efforts and the
• To disseminate cutting-edge technologies out of the lab and study life in context – data infrastructure that underpins scientific
To benefit humankind by advancing to industry and applications of science. from complex ecosystems in oceans to the discoveries worldwide.
scientific discovery and impact through bacterial communities in the human gut.
bioinformatics. • To support, as an ELIXIR Node, the Ewan Birney and Rolf Apweiler
coordination of biomolecular data EMBL-EBI Directors
provision in Europe.

2 3
EMBL-EBI HIGHLIGHTS 2021 ECONOMIC IMPACT OF OPEN DATA

The value of
EMBL-EBI
Economic impact data resources

of open data Use


value
Researchers spend 140
million hours/year using
“EMBL-EBI managed data resources represent excellent EMBL-EBI data resources.
value for money, and open data is critical for life sciences.” That’s equivalent to
Neil Beagrie, Director of Charles Beagrie Ltd. £5.5 billion.

Open data underpin the life sciences, and The study used multiple approaches to
the open data resources EMBL-EBI manages assess economic value, including an online Critical
are essential to the work of millions of survey which received more than 4,900 infrastructure
scientists around the world. This profoundly responses. This was the largest study on
58% of respondents
collaborative, global data ecosystem is so open data in recent years, and revealed that
said they couldn’t have
deeply embedded in how science works the use and impact of open data in the life
collected the last dataset
that, without it, many research labs would sciences is growing year on year.
they used or obtained it
grind to a halt.
elsewhere.
Sustainable funding and international
In 2021, an independent study by Charles collaborations are crucial to ensure that
Beagrie Ltd. estimated the economic EMBL-EBI can maintain and develop open
value and impact of EMBL-EBI data data resources in line with the needs of the
resources. The study found that EMBL-EBI scientific community, so science and
“EMBL-EBI resources are the lifeblood of my
Return on
is a crucial research infrastructure for the society can truly reap the benefits.
research group. Without these services, a
investment
global scientific community, and that the full two-thirds of my group’s research efforts EMBL-EBI data resources
data resources managed by the institute would suffer dramatically.” contribute to the
represent exceptional value for money. realisation of future
Survey respondent, Belgium
research impacts worth
£2.2 billion annually.

“[Without EMBL-EBI resources] my work


would be severely affected. The resources
of EMBL and EMBL-EBI allowed me to work
even during the lockdown.”
Survey respondent, Italy

Read the
report
4 5
EMBL-EBI HIGHLIGHTS 2021 2021 IN NUMBERS

2021 in numbers
over
40 256
open data journal papers
resources published
from
78
837 countries
over
members
of staff* 20
petabytes of data
*includes fellows,
supernumeraries,
497,000 deposited into
EMBl-EBI resources
unique IP addresses
trainees and visitors
accessed our
online training

from
189 41
active grants million unique
3,900 IP addresses
people participated 107
in our 46 public million requests to
engagement our websites on an
activities average day
147
collaborative grants
with researchers in
52 countries from
644 institutes

6 7
EMBL-EBI HIGHLIGHTS 2021 HIGHLIGHTS OF THE YEAR

Highlights of
Thornton group uses UK Biobank data to investigate
genetics of age-related diseases (page 21)

the year
Finn group identifies 140,000 virus species in the human
gut and hundreds of new bacteria species on human skin
(page 22)

AlphaFold Database for protein structure predictions Data Use Ontology for streamlining usage of biomedical
launches (page 12) data in research launches (page 29)

3D-Beacons network gathers publicly-available protein REMBI metadata guidelines for bioimages published to
structure data in one place (page 17) improve microscopy data reuse and analysis (page 16)

COVID-19 Data Platform continues to grow (page 14)


Proteomics data usage and interoperability shows potential
Gerstung group tracks SARS-CoV-2 lineage spread (page 20) for biomedical research (page 18)

Marioni group analyses differences in immune responses to CABANA project for increasing bioinformatics capacity in
COVID-19 (page 20) Latin America wraps up (page 25)

Leach group identifies drugs that could be repurposed for Training Competency Hub increases focus on FAIR training
COVID-19 treatment (page 20) (page 24)

PGS Catalog, an open database for polygenic risk scores, First Bioinformatics for BioBusiness event for small and
launches (page 17) medium companies takes place (page 28)

Darwin Tree of Life Data Portal, which will host genomic Strategic partnerships with cloud providers commence
data for thousands of species, launches (page 19) (page 34)

Iqbal group and CRyPTIC project launch world’s largest Transition to new EMBL visual framework (page 34)
database for drug resistance in tuberculosis (page 21)

8 9
EMBL-EBI HIGHLIGHTS 2021 DATA RESOURCES

Data resources
EMBL-EBI manages the world’s most comprehensive suite of open data resources
for the life sciences. Our 40 data resources and dozens of tools span genetics,
genomics, proteins, chemistry, literature data and more.

Chemicals, Genes, Proteins Imaging and Genetic variation Literature and


molecules and genomes cellular and disease data knowledge
drug discovery and RNA structure management

ChEBI ArrayExpress AlphaFold DB BioImage Archive COVID-19 BioModels


ChEMBL Ensembl Enzyme Portal Electron Microscopy Data Platform BioSamples
MetaboLights European Nucleotide InterPro Data Bank European BioStudies
Archive Electron Microscopy Genome-phenome
Open Targets PDBe Archive Complex Portal
Expression Atlas Public Image Archive
SureChEMBL PDBe-KB European Variation Europe PMC
HGNC Pfam Archive GWAS Catalog
MGnify PRoteomics Mouse informatics IntAct
Rfam IDEntification OmicsDI
RNAcentral UniProt Ontologies
VectorBase UniProtKB Reactome
WormBase

10 11
EMBL-EBI HIGHLIGHTS 2021 DATA RESOURCES

AlphaFold ushers in a new era in biology ALPHAFOLD DATABASE IN 2021

A protein’s 3D structure is closely related Building on decades of expertise in making


to its function, so knowing its shape is the world’s biological data available, EMBL-
essential for understanding how it works, EBI is working with DeepMind to grow the
and why sometimes things go wrong. database to 130 million predictions. We’re
But predicting a protein’s structure is no dedicated to ensuring the predictions
easy task. Since the 1970s, this has been are Findable, Accessible, Interoperable 992,000 48 365,000 15,000
one of the biggest unanswered questions and Reproducible (FAIR) so researchers protein structure species unique users views of first
in biology, with many experimental and everywhere can exploit them to drive predictions AlphaFold
computational scientists dedicating their discovery. webinar
careers to it.
“With this resource freely and openly
This all changed when artificial intelligence available, the scientific community will be
(AI) company DeepMind released its able to draw on collective knowledge to
ground-breaking AlphaFold system, which
accelerate discovery, ushering in a new era SAMEER VELANKAR
uses deep learning to predict protein
for AI-enabled biology.” – Sir Paul Nurse,
structures more accurately than any
previous computational methods. Nobel Laureate for Physiology or Medicine, AlphaFold database – a look behind
Director of the Francis Crick Institute the scenes
The real gains started when DeepMind
teamed up with EMBL-EBI to make its “AlphaFold has not only reinvigorated structural biology
predictions freely and openly available to but will have far reaching impact on life science research in
all through the AlphaFold Protein Structure the coming decades,” said Sameer Velankar, Team Leader
Database. at EMBL-EBI.

Hailed by Forbes as “the most important Sameer was one of the people who made the collaboration between EMBL-EBI and
achievement in AI – ever”, AlphaFold is a DeepMind a success. Thanks to the dedication of Sameer and his team, almost one
perfect example of the virtuous cycle of million protein structure predictions have already been made accessible through the
open data. The system was ‘trained’ on AlphaFold Protein Structure Database.
data generated by experimental scientists.
These data are freely available in public Sameer studied structural biology at the Indian Institute of Science in Bengaluru,
databases, such as those co-hosted by India before doing a postdoctoral degree in Oxford, UK. At EMBL-EBI he leads the
EMBL-EBI (PDB, UniProt and MGnify). Protein Data Bank in Europe (PDBe) team.

By making AlphaFold’s predictions freely Sameer is an advocate of open data and collaborative science. “It’s wonderful to see
available to all, DeepMind opened up scientists building on AlphaFold and using its predictions to advance their work. I
research avenues for scientists everywhere expect many new developments in the coming months and years.”
in areas such as drug discovery, neglected
diseases, improving crops, designing new
enzymes with new functions like processing
waste or degrading plastics.

Structure prediction for Trypanosoma cruzi protein. The parasite


causes Chagas disease. Credit: AlphaFold. Accession: Q4DCH1 12 13
EMBL-EBI HIGHLIGHTS 2021 DATA RESOURCES

Tracking the National and international A new tool allows bulk downloads outside
collaborations
pandemic in real time
of the web browser in a range of formats.
Insights revealed by the COVID-19 Data This enables researchers to keep their local
Platform and research at EMBL-EBI data in sync with the data available in the
The second year of the COVID-19 pandemic were used to highlight the spread of the portal, making local data analysis faster and
signalled a shift to using open data for Delta and Omicron lineages to national more flexible.
monitoring the spread of the viral lineages governments in a series of meetings and
circulating worldwide. As an established consultations. Preparing for future pandemics
source of SARS-CoV-2 open data, EMBL- A new wastewater surveillance working
EBI’s COVID-19 Data Platform continued to As the benefits of national and centralised group was set up to improve global efforts
increase in size and improve the depth of open data portals became clear, we to use viral RNA in wastewater samples
analysis it facilitates. continued to support researchers in to track case trends. The working group,
different countries with SARS-CoV-2 data comprising specialists from EMBL-EBI and
coordination. In 2021, two more national ten European countries, are collaborating
COVID-19 Data Portals were set up in to define a systematic approach to data
Turkey and Greece, alongside the existing collection and harmonisation, and to
“The COVID-19 Data Platform has gone from strength ones in Estonia, Italy, Japan, Norway, propose bioinformatics solutions for
to strength with increased user engagement and new Poland, Slovenia, Spain and Sweden. wastewater surveillance, which could be
features and tools; we’re now developing it for future crucial in future pandemics.
pandemic preparedness.” Improved search and analysis
Marianna Ventouratou, Platform Manager We increased the functionalities of the Building on the COVID-19 Data Portal, the
Platform by introducing new search and new BeYond-COVID (BY-COVID) initiative,
filtering tools, such as a new API, which led by ELIXIR, aims to improve European
enables researchers to adapt search readiness for future pandemics, enhance
COVID-19 DATA PORTAL PROVIDES OPEN ACCESS TO 11 MILLION RECORDS results to their specific use case. For ease genomic surveillance and rapid response
of use and reference, the World Health capabilities. EMBL-EBI is contributing
Organisation’s nomenclature search was its data coordination expertise to this
also added to the Platform. ambitious initiative set to generate valuable
new insights on infectious disease.

Viral and Gene expression Protein structure Biological


host genomic and function pathways and COVID-19 DATA PORTAL
sequences interactions

6.4 million 248,000 11 million


Drug targets and Imaging Literature web requests unique users from SARS-CoV-2
compounds since launch 191 countries data records

14 15
EMBL-EBI HIGHLIGHTS 2021 DATA RESOURCES

The foundations of a bioimaging revolution Assessing inherited disease risk


Bioimaging has advanced in leaps and In an effort to make bioimaging data easier Polygenic risk scores (PGS) are a new To bridge the gap and support the adoption
bounds in recent years, giving researchers to find and reuse, researchers from over 30 clinical tool for assessing a person’s of PGS scores when appropriate in clinical
a closer look at how life works at a cellular, institutions including EMBL-EBI published inherited risk for diseases such as Type 2 settings, EMBL-EBI teamed up with the
molecular and even atomic level. But the Recommended Metadata for Biological diabetes or coronary heart disease. The University of Cambridge to create the PGS
maximising the potential of bioimaging Images (REMBI). REMBI provides metadata score provides an estimate of an individual’s Catalog, an open database of published
data depends on robust data sharing standards across biological imaging domains risk for the disease, based on their DNA. polygenic risk scores. The teams also
infrastructure and standards. supporting FAIR sharing of image data. collaborated with NHGRI to create a
Despite the rise in PGS studies, there are minimal information framework for PGS,
In 2021, EMBL-EBI and collaborators As the volume and quality of bioimages inconsistencies in how the scores are published in the journal Nature1, which
continued to develop the BioImage increases – and the technologies become calculated and reported. Lack of adherence helps promote the validation of scores as
Archive (BIA), the open data resource for more accessible through initiatives such as to standards has hindered the translation of well as the reproducibility of the data. This
biological imaging data associated with EMBL’s Imaging Centre or Euro-BioImaging this important tool into healthcare. will be key for PGS usage in the clinic.
peer-reviewed publications. BIA anchors – the scientific community needs to
an entire ecosystem of related databases, cooperate to openly share the data to
ensuring efficient data storage, crosslinks maximise reuse. Protein structure data in one place
and indexing. BIA doubled in size in 2021
and already provides access to over one Another useful development for protein Having all protein structure data in one
petabyte of microscopy data from a suite structures was the launch of the place saves researchers time and makes it
of imaging technologies and biological 3D-Beacons Network. This community- much easier to access these models to get
domains. BIA is funded by the UK Research driven initiative acts as a one-stop valuable insights into health and disease,
and Innovation’s Strategic Priorities Fund. shop for experimentally-determined identifying drug targets and more.
and computationally-predicted protein
structures from different providers.

Before the network launched, scientists had


to search a range of data resources to find
out if any structure models existed for their
protein of interest. With the 3D-Beacons
Network, researchers can now find all the
available models from different providers in
a standardised format, with just one search.

The list of participating data providers is


impressive, with the Protein Data Bank
in Europe (PDBe), AlphaFold DB, SWISS-
MODEL, Protein Ensemble Database, Small
Angle Scattering Biological Data Bank,
Genome3D, and PDBe Knowledge Base
Mouse brain cells used in neurodegeneration research.
(PDBe-KB) joining in. The Central Hub for
Credit: BioImage Archive. Accession: S-BIAD7 the Network is located at EMBL-EBI.

1. Wand, H. et al. (2021). Improving reporting standards


16 for polygenic scores in risk prediction studies. Nature. DOI: 17
10.1038/s41586-021-03243-6
EMBL-EBI HIGHLIGHTS 2021 DATA RESOURCES

Unlocking proteomics data Genomic data for biodiversity


Proteomics is the large-scale study of As more proteomics data becomes In advance of the UN’s Biodiversity In 2021, the Darwin Tree of Life Project
proteins, and the proteome is a set of publicly-available, its reuse also increases Conference (COP15), and with megaprojects (DToL), which aims to sequence all
proteins produced by an organism or a dramatically, for example for big data such as the Darwin Tree of Life (DToL) eukaryotic species in the UK and Ireland,
system. Proteomics helps answer key methods. EMBL-EBI is focusing on and Earth BioGenome Project sequencing also made significant progress. Using a new
biological questions about health and disseminating and integrating proteomics hundreds of new species every day, the and improved rapid release function, we
disease, but can be challenging to study data from PRIDE into added-value data question about how genomic data are made annotated and released genomes for over
because the proteome is much more resources, namely UniProt (phosphorylation available to the scientific community is 90 species, including many butterflies and
complex than the genome and it isn’t data), Expression Atlas (protein expression), more pertinent than ever. moths, in Ensembl and the DToL Data Portal
constant – it changes from cell to cell Ensembl (proteogenomics data) and MGnify – the gateway to all the data generated by
and over time. (metaproteomics data). The aim is to make To address community calls for richer the project. The Portal is developed and
proteomics data useful and accessible to all metadata, the International Nucleotide maintained by EMBL-EBI, and boasts a new
The PRIDE data resource at EMBL-EBI life scientists for a range of applications. Sequence Database Consortium, which phylogeny browser that allows users to
is the world-leading proteomics data includes EMBL-EBI’s European Nucleotide see at a glance what data is available for a
repository, storing more than 80% of all Archive has made spatio-temporal particular clade, family, or genus. The DToL
public proteomics data worldwide. It also metadata mandatory for new submissions. dataset, which is openly available to all,
plays a leading role in the International The move aims to enrich the scientific is set to uncover significant insights in the
ProteomeXchange Consortium, which value of the data, especially for researchers field of biodiversity.
coordinates proteomics data provision working in biodiversity, ecology and
worldwide. infectious disease.

MICHAEL PARKIN

Better access to scientific research


Growing up, Michael Parkin enjoyed science so much that for
a long time he couldn’t choose between physics, chemistry
and biology. The scales finally tipped when Michael decided
to study Biology and Chemistry at Durham University (UK).

After a brief spell in scientific publishing, Michael joined the Literature


Services team at EMBL-EBI, where he sources new content for Europe PMC, the open-
access repository containing millions of biomedical research articles.

“I retrained as a Data Scientist while at EMBL-EBI, and now focus on making


scientific publications easier to find and explore,” Michael explained. “I also make
research machine-findable so funders and universities can track what research
they’ve enabled. During the pandemic, my team made COVID-19 preprints and grants
accessible through Europe PMC, helping researchers track the latest developments in
record time.”

18 19
EMBL-EBI HIGHLIGHTS 2021 RESEARCH

Research
Targeting drug-resistant
tuberculosis
As part of the international CRyPTIC
EMBL-EBI’s research groups focus on computational biology and making sense of consortium, the Iqbal group worked
vast, complex datasets. Our researchers work closely with experimental scientists on developing the world’s largest data
worldwide, increasingly tackling problems of direct significance to medicine and resource for the study of drug resistance
the environment. in tuberculosis (TB). The work informed
the World Health Organisation’s new TB
resistance catalogue.
Big data analysis shines light on the pandemic
CRyPTIC aims to reveal the genetic
Several EMBL-EBI research groups used Their work highlighted the value of genomic causes of drug resistant TB to enable the
bioinformatics to reveal valuable insights on surveillance combined with big data analysis. development of DNA-based diagnostics.
emerging SARS-CoV-2 lineages, differences Being able to see viral lineages side-by-side For this study5, the consortium used a new
in immune response and repurposing drugs and mapped to specific locations helped to experimental assay to measure the level
to treat COVID-19. understand the spread of the virus and to of drug resistance and to scan the genome
inform public health measures. of over 15,000 Mycobacterium tuberculosis
The Gerstung group, in collaboration strains from across the globe.
with the Wellcome Sanger Institute, used The Marioni group and collaborators used
genomic data to produce the most detailed single-cell sequencing to identify differences The resulting dataset, hosted at EMBL-EBI
analysis2 of how the COVID-19 pandemic in the immune response to COVID-19 and openly-available, can be used to probe
developed in England. They tracked over between symptomatic and asymptomatic the genetic causes of drug resistance in
70 virus lineages and were among the first people3. The research offers a view at an TB. The end goal is to apply these insights
to sound the alarm regarding the increased unprecedented resolution into how the body to DNA-based diagnostics, predicting the
transmissibility of Delta and Omicron. fights the disease, and reveals useful insights drug resistance of a patient’s TB strain and
to guide drug discovery and treatment. enabling personalised treatment.

The Leach group and collaborators used


EMBL-EBI’s ChEMBL database of small The genetics of age-related diseases
molecules to identify proteins that could be
good drug targets for COVID-19 therapies4. Researchers in the Thornton group used They revealed genetic links between
The work brought together experts from medical and genetic data from the UK diseases with the same onset profile,
a range of different areas, who quickly Biobank to investigate links between suggesting that they may share a common
responded to the urgent need for treatments. genetics and age-related diseases. Their cause. Their work is a prime example of
work produced the first data-driven the unique insights that UK Biobank data
2. Vöhringer H.S. et al. (2021). Genomic reconstruction of the SARS-
classification of 116 age-related diseases, combined with bioinformatics expertise can
CoV-2 epidemic in England. Nature. DOI: 10.1038/s41586-021-
04069-y including anaemia, deep vein thrombosis bring to light.
3. Stephenson E. et al. (2021). Single-cell multi-omics analysis of and depression6.
the immune response in COVID-19. Nature Medicine. DOI: 10.1038/
s41591-021-01329-2
4. Gaziano L. et al. (2021). Actionable druggable genome-wide
Mendelian randomization identifies repurposing opportunities for 5. Brankin A. et al. (2021). A data compendium of Mycobacterium tuberculosis antibiotic resistance. bioRxiv. DOI: 10.1101/2021.09.14.460274
COVID-19. Nature Medicine. DOI:10.1038/s41591-021-01310-z 6. Donertas M.H. et al. (2021). Common genetic associations between age-related diseases. Nature Aging. DOI: 10.1038/s43587-021-00051-5

20 21
EMBL-EBI HIGHLIGHTS 2021 RESEARCH

A closer look at gut viruses and skin bacteria


The Finn research group used A second paper8, in collaboration with FRANCESC MUYAS REMOLAR
metagenomics to reveal new insights about the National Institutes of Health, found
the human gut and skin microbiomes. One hundreds of previously unidentified viruses,
paper7 identified more than 140,000 viruses bacterial and fungal species living on human Using bioinformatics
living in the human gut, including a highly skin. Combining traditional lab methods to understand cancer
prevalent group that has never been seen with an innovative metagenomic sequencing
before. The work opens up new research approach, the researchers developed a A liquid biopsy is a non-invasive alternative to surgical
avenues to understand how human health comprehensive collection of reference biopsies, which enables scientists to characterise tumours
and disease are affected by viruses living in genomes for the human skin microbiome, using a blood sample. This is just one of the methods Francesc
the gut. which will support future investigations into Muyas Remolar, a Postdoctoral Fellow in the Cortes-Ciriano research group, is
skin health and disease. investigating, in the hope of maximising the potential of genomics in the clinic, with
special interest in early cancer detection.

Francesc was born near Valencia and studied genetics, bioinformatics and
biomedicine in Spain and Germany. He is using new technologies including Nanopore
long-reads and single-cell RNA sequencing data, to characterise cancer genomes,
understand how mutations occur, and why patients react differently to treatment.

“We’re harnessing knowledge from many data types,” said Francesc. “I would like
our work to translate into clinical applications that improve cancer diagnosis and
treatment.

“Joining a research group at EMBL-EBI is an amazing opportunity to engage with


world-leading experts in my field and beyond, and get an overview of the latest
science.”

“Our future work will aim to understand


what the different microbes living on
our skin are responsible for.”
Sara Kashaf, Predoctoral Fellow

7. Camarillo-Guerrero L.F. et al. (2021). Massive expansion of human gut bacteriophage diversity. Cell. DOI: 10.1016/j.cell.2021.01.029
8. Kashaf S.S. et al. (2021). Integrating cultivation and metagenomics for a multi-kingdom view of skin microbiome diversity and functions.
Nature Microbiology. DOI: 10.1038/s41564-021-01011-w

22 23
EMBL-EBI HIGHLIGHTS 2021 TRAINING

Training Building capacity in Latin America


The CABANA project aims to strengthen Despite the challenges that the pandemic
capacity in data-driven biology in Latin raised for this project, which was based
EMBL-EBI delivers world-leading training in bioinformatics and scientific service America. Led by EMBL-EBI and nine on moving people from one continent to
provision. We empower scientists at all career stages to make the most of research organisations from Latin America, another, all the objectives were achieved
biological data, and to strengthen bioinformatics capacity across the globe. The CABANA helped researchers in the region and surpassed, resulting in a strong network
team is part of EMBL’s International Centre for Advanced Training (EICAT). participate in large global consortia that future collaborations can build on.
equitably, and contribute to solving global
challenges, specifically biodiversity, food
security and communicable diseases.
Making training FAIR
The project, which ran between October
Bioinformatics and big data analysis are Finally, keeping track of training and 2017 and May 2022, was funded through
becoming essential skills for life science being able to refer back to it is essential the UK Research and Innovation’s Global
researchers. To support scientists at all for learners. The new EMBL-EBI training Challenges Research Fund. CABANA
levels and build capacity, the EMBL-EBI platform empowers researchers to design enabled the delivery of bioinformatics
Training team brings together a wealth their own training journey. It brings together workshops in Latin America, the creation of
of content and expertise, and makes curated collections from hundreds of online bespoke e-learning courses and train-the-
bioinformatics training more Findable, tutorials, webinars and courses, while also trainer activities to increase capacity. At
Accessible, Interoperable and Reproducible proposing new formats for handbooks, the heart of the project were secondments
(FAIR). making it easier to keep track of notes once that enabled Latin American scientists to
courses finish. visit other research institutes and embed
One of the biggest challenges is access to themselves in another lab. The project also
training. During the COVID-19 pandemic, In close consultation with the global supported seven collaborative research
the team redoubled its efforts to provide user community, EMBL-EBI continues to projects in the region.
easily-accessible online training and grow its training offering, and develop its
webinars. In 2021, the uptake of webinars competency hub with the aim of opening
exceeded all expectations, showing up bioinformatics to a wider, more diverse
the critical global need for increased group of people.
bioinformatics capacity.
“I really enjoyed the course and I will benefit
A second challenge is understanding what a lot in my future work! By far the best course
competencies are required for certain I’ve ever attended.”
career paths. To guide users in their career Trainee, Germany
and training choices, our Training team
is developing a Competency Hub, which “The expertise of the trainers on their topics was
lists in a clear and accessible way the one of the best things, compared to other courses
competencies individuals might need in I’ve taken. The state-of-the-art information
their career. provided will be most useful in the future.”
Trainee, Spain

24 25
EMBL-EBI HIGHLIGHTS 2021 TRAINING

CABANA ACHIEVEMENTS
“There’s no doubt now that the CABANA network will continue
and is committed to delivering training and pursuing its
challenge-led research goals. We’ve helped create a community
of data-driven biologists working across an entire continent,
and that’s a really powerful thing.”
Cath Brooksbank, EMBL-EBI Head of Training
39 28 9
secondments workshops e-learning
courses

01101 SARAH MORGAN


101
01
Not even a pandemic can stop training
“Bioinformatics is essential for anyone doing a PhD in the
life sciences – it unlocks access to so much data,” explained
11 800 9 Sarah Morgan, Training Programme Manager at EMBL-EBI.
train-the trainer scientists trained research
in bioinformatics innovation Born in Wales, Sarah studied biomedical science at Cranfield
courses (120
awards* University, where she also worked as a lecturer before joining
people trained)
EMBL-EBI. Sarah manages the day-to-day running of the EMBL-EBI training
*of which the biggest one sequenced SARS-CoV-2 genomes and identified strains in Latin America programme, and has overseen its development into one of the most extensive
bioinformatics training offerings in the world.

“The pandemic was a big incentive to create a framework for running our courses
virtually. The team was fantastic in coming up with ideas and making things happen.
Interest in our courses spiked during the pandemic, and we developed a range of
tools that we’ll continue to use in the future. We also launched a course on single-
“Projects like CABANA allow people in Latin America to build cell RNA sequencing, which has proven very popular.”
further bonds between bioinformatics groups, to be a part of
this community and carry out research using bioinformatics.”
Guillermo Rangel-Pineros, Secondee from University of Los
Andes, Columbia EMBL-EBI TRAINING IN 2021

“For teaching, you need different views and different expertise


and we get all this from CABANA.” 17 live training Eight new online 110,000 people 497,000 unique IP
Alfredo Herrera-Estrella, CABANA Co-investigator, activities reaching tutorials, bringing watched our addresses accessed
National Laboratory of Genomics for Biodiversity, Mexico 486 delegates the total to 83 webinars EMBL-EBI training
website

26 27
EMBL-EBI HIGHLIGHTS 2021 STRATEGIC PARTNERSHIPS

Strategic
EMBL-EBI INDUSTRY PROGRAMME

partnerships Founded in 26 member companies Six workshops with


1996 from pharma, agri-food 900 participants
As biology becomes more data-driven, EMBL-EBI is developing strategic and consumer goods in 2021
partnerships with private and public sector organisations to help them maximise
the potential of bioinformatics.

Building strong ties with industry Streamlining access to biomedical data


EMBL-EBI’s Industry Programme is a In 2021, EMBL-EBI also consolidated its As part of the Global Alliance for Genomics DUO is set to streamline researcher access
subscription-based initiative, aimed at ties with small and medium companies and Health (GA4GH), EMBL-EBI researchers to biomedical datasets, making it easier to
global companies that make significant use (SMEs) through the inaugural edition of develop genomic data standards and tools navigate the complex landscape of sensitive
of EMBL-EBI data resources as part of their the Bioinformatics for BioBusiness event, to enable responsible genomic data sharing biomedical data. Better and faster access
research and development activities. organised in collaboration with Medicines within a human rights framework. to biomedical data within a safe and secure
Discovery Catapult. This was the perfect system will speed up genomic research, and
“The EMBL-EBI Industry Programme has been opportunity for SMEs to get up to speed with In 2021, with colleagues from the Broad the use of genomics in a clinical setting.
by far the strongest public-private enterprise the latest developments in bioinformatics Institute, we developed the Data Use
to shape the relationships between academia and get bespoke specialist advice from Ontology (DUO), which provides standard
and pharma.” our experts in genome annotation, single- terms and definitions that can be used to
Bertram Weiss, Head of Research Product cell sequencing, metagenomics and drug describe biomedical data. Data holders
Platform, Bayer AG discovery. can use DUO to describe and tag datasets,
making them automatically discoverable to
“We use ChEMBL all the time to access public researchers based on the intended usage.
domain chemical structures. Recreating that
would not be possible for a single company.”
Impact survey respondent, UK

28 29
EMBL-EBI HIGHLIGHTS 2021 CULTURE AND INFRASTRUCTURE

Coordinating bioinformatics in Europe


Culture and
infrastructure
ELIXIR is an intergovernmental organisation As an ELIXIR Node, EMBL-EBI supports the
that coordinates, integrates and sustains coordination of biological data provision
bioinformatics resources from different throughout Europe. Many EMBL-EBI data
European providers – including EMBL-EBI – resources are included in ELIXIR’s Core
and enables users in academia and industry Data Resources list – a repository of
to access services that are vital for their databases deemed to be fundamental to the
research. life sciences. EMBL-EBI provides computing The success of EMBL-EBI relies on its collaborative culture, diverse workforce,
power, training, and standardisation and robust technical infrastructure. In 2021, some of the key focus areas were
activities to the ELIXIR community. We equality, diversity and inclusion, environmental sustainability, and consolidating
are also collaborating with ELIXIR on
existing technical infrastructure.
the BY-COVID project, which provides
comprehensive open data on SARS-CoV-2
and other infectious diseases across
scientific, medical, public health and policy Equality, diversity and inclusion
domains.
Equality, diversity and inclusion (EDI)
became embedded across all EMBL sites,
including EMBL-EBI, with the establishment
of the EMBL EDI Office, a cross-EMBL
volunteer forum and the EDI governing
GAIA CANTELLI
body.

The strategy behind the science The results of an all-staff EMBL survey,
The love for science can be sparked by the smallest conducted in March 2021, about
interactions. For Gaia Cantelli, who is an EMBL Strategy perceptions of EDI across the organisation
Officer, the spark was a collection of illustrated books she informed EMBL’s first Equality, diversity
had growing up – she loved how they were both beautiful and inclusion strategy, now in its early
and logical. implementation phase.

Gaia was born in Bologna, Italy, and studied cell and molecular biology in The four axes of the strategy focus on
Cambridge and London before doing a postdoc and teaching scientific writing at attracting and promoting underrepresented
Duke University in the USA. groups at EMBL, ensuring that EDI
principles are built into EMBL’s systems and
“My current role involves strategic planning, implementation and reporting,” processes, developing a model of inclusive
explained Gaia. “My team works with all the parts of EMBL to align everyone to leadership in science, and engaging on EDI
a common direction of travel. We’re currently focusing on the implementation of with EMBL’s many collaborators.
EMBL’s new and ambitious programme – Molecules to Ecosystems – which aims to
study life in context. In 2022 we’re going to really see the programme come to life, Read more about
as work on new initiatives ramp up.” EMBL’s Equality,
Diversity, and
Inclusion strategy
30 31
EMBL-EBI HIGHLIGHTS 2021 CULTURE AND INFRASTRUCTURE

Environmental sustainability Public engagement


To show EMBL’s commitment to sustainable The assessment identified three pillars of A key part of EMBL-EBI’s public engagement
research, and help reduce its impact sustainability at EMBL: environmentally- strategy is to connect with audiences to
on the environment, the organisation responsible research, environmentally- understand their needs and to develop
commissioned a materiality assessment, relevant research and promoting mutually-beneficial relationships. The
which invited input from staff, leadership sustainable science. The organisation has audience scoping project, in collaboration
and key external stakeholders. now also set ambitious environmental with Graphic Science, was the first step
targets for the next decade. on this journey. We connected with five
community intermediaries and conducted
a teacher focus group. One project has
already been delivered, and several new
activities are in development.

Find out more about our


public engagement activities

Supporting staff during the pandemic


As the pandemic continued into the second EMBL-EBI recruited and onboarded
year, EMBL-EBI’s Business Continuity over 200 new members of staff in 2021,
Planning team supported the institute to and continued to support staff with the
navigate the uncertainties of returning to uncertainties and challenges of working
the office when possible. EMBL laid the during the COVID-19 pandemic.
groundwork for launching a hybrid working
pilot, which will enable staff to adopt more
flexible working patterns, while continuing
to tap into the creative and collaborative
culture of the campus.

Read about
EMBL’s
sustainability
strategy

32 33
EMBL-EBI HIGHLIGHTS 2021 CULTURE AND INFRASTRUCTURE

A new building
ANIA NIEWIELSKA
To increase office space, we built the
Modular Accommodation, which houses Developing creative tech solutions
100 staff. As part of the Wellcome Genome After studying computer science in the city of Gliwice,
Campus expansion, EMBL-EBI is set to Poland, Ania Niewielska spent over ten years developing
construct a third permanent building, which software solutions for the banking sector.
will accommodate 250 members of staff.
The new building is funded by an award from Following a life-long interest in the life sciences, she joined
UK Research and Innovation and Wellcome. EMBL-EBI and currently works as a Lead Software Engineer. After
The planning application for the building has working on several EMBL-EBI projects, Ania has recently taken over management
been accepted, and significant progress has of the institute’s Resource Usage Portal, which helps internal teams get the
been made through staff consultation and technical resources they need for their projects. This covers virtual machines,
outlining the design concept. databases and the cloud.

Construction is planned to begin in late “I like working on real use cases, interacting with users and stakeholders, asking
2022, with the building set to open in 2024. them questions and drawing out what problems they’re really trying to solve, then
finding the best tech solutions for them,” said Ania.

“The thing I enjoy most about EMBL-EBI is the campus – it’s a safe space and a
great environment to exchange ideas.”
Improved technical infrastructure
EMBL-EBI’s total data footprint has The institute has also updated its website,
been growing year on year, and is now bringing the look and feel in line with the
approaching 400 petabytes of raw storage. new EMBL visual framework, and
We’re currently focusing our efforts on creating bespoke microsites for
consolidating the storage infrastructure every team, enabling them to
by removing older temporary data that showcase their work and
no longer has value, or that is duplicated collaborations.
elsewhere, in preparation for new data
types and future growth.

Alongside this, our teams have also been


tapping into the potential of cloud storage
and computing, with two new strategic
partnerships with Google Cloud and Amazon
Web Services announced in 2021. This is
the next step in EMBL’s hybrid multi-cloud
strategy, and was made possible through
funding from UK Research and Innovation.

34 35
EMBL-EBI HIGHLIGHTS 2021 NEW LEADERSHIP

New leadership Tudor Groza


Tudor joined EMBL-EBI to lead the newly-formed
Phenomics team, which focuses on developing data
Meet the group and team leaders who joined EMBL-EBI in 2021 acquisition and integration services to support
advances in computational phenotyping for rare and
complex disorders.

Mary Barlow Melissa Harrison


Mary was promoted to Head of Campus Operations and Melissa joined EMBL-EBI as a Team Leader for Literature
Capital Projects. Her team manages a range of crucial Services. Her team runs Europe PMC, an open science
functions, including facilities, new building development, platform that enables access to life science publications
process improvements, and oversight of operational and and preprints.
capital projects.

Sihem Bennour Peter Harrison


Sihem is EMBL-EBI’s new Head of Human Resources. Peter is the new Team Leader for Genome Analysis.
Her team supports EMBL-EBI staff during every stage His team develops the regulatory components, core
of employment, and promotes an enabling, inclusive, infrastructure and web applications for the Ensembl
international working environment for all. project, a world-leading browser for vertebrate genomes.

Sarah Dyer Matthew Hartley


Sarah is the new Non-vertebrate Genomics Team Leader. Matthew leads the BioImage Archive team, which
She brings a wealth of experience working with plant and develops and operates EMBL-EBI’s broad scope bioimage
crop genomes, and leads three Ensembl teams – Plants, data resource. His team’s experience crosses biological
Metazoa and Outreach. imaging, data management and software development.

36 37
EMBL-EBI HIGHLIGHTS 2021 FACTS AND FIGURES

Facts and figures Financial figures


We are grateful for the continued support In 2021 they helped us maintain our data
of our member states and other funding resources, conduct vital research and training,
bodies. as well as respond to the requirements of the
international scientific community.
Staff numbers
Personnel categories
in 2021 Funding sources Capital infrastructure
investors
EMBL-EBI receives its funding through Member state
three main funding channels. The operating contribution
15%
expenditure is funded by either EMBL 40%
621
Staff members
member state contributions (40%) or from
external funding bodies via grant awards
(45%). EMBL-EBI also receives significant
47
Postdoctoral
40
Predoctoral
capital awards for investment in buildings
45%
and data infrastructure (15%).

708
fellows fellows
Grant
funders

Gender distribution Senior roles Operating expenditure Training


Admin,
management
of staff in 2021 held by women The total operating expenditure of EMBL- Technical and estates
support
EBI in 2021 was €92.6m from all funding 6%
36% 39% sources. Scientific services accounted for
15%
14%
30% 33%
49% of the expenditure in support of EMBL-
EBI’s data resources, with a further 17%
23% allocated to research and 6% to external 17% Research

training. Approximately 14% of costs were


on technical support which maintains and
develops the technical/IT infrastructure,
43% 57% and 14% on management, administration,
49%
Female Male 2017 2018 2019 2020 2021 and estate costs.
Scientific
services

38 39
EMBL-EBI HIGHLIGHTS 2021 OUR FUNDERS

Capital investment
EMBL-EBI’s Data Infrastructure for Life
Sciences programme began in 2019 and
The second part of the programme is to
increase the office space available to
Our funders
runs to 2024. The multi-year programme EMBL-EBI. The Modular Accommodation
oversees £70 million of capital investment, was completed in 2021 providing We would like to thank our funders for their continued support. We thank the EMBL
of which £45 million comes from the UK accommodation for 100 EMBL-EBI staff. member states as well as our other funders listed below.
government’s Strategic Priorities Fund via This was funded with contributions from the
UK Research and Innovation (UKRI). This UKRI and EMBL. The UKRI and Wellcome
enables the transformation of storage, Trust have funded a longer-term office
compute and networking facilities as the space solution – a third permanent building • Alzheimer’s Research UK • Fonds National de la recherche Luxembourg
demand for our data resources continues to called the Thornton Building (page 34).
• Biotechnology and Biological Sciences • Global Challenges Research Fund
grow, and helps to establish the BioImage Research Council
Archive, a central, open data resource for Expenditure relating to these capital • Gordon and Betty Moore Foundation
• Bill & Melinda Gates Foundation
biological images (page 16). investments amounted to €14.5m during • Medical Research Council
the 2021 EMBL financial year. • British Council
• NF Research Initiative
• Broad Institute of MIT and Harvard
• National Cancer Institute
• Cancer Research UK
• National Institutes of Health
• Connective Tissue Oncology Society
Strategic partners and funders • Chan Zuckerberg Initiative
• National Institute for Health Research

EMBL-EBI works closely with strategic • Novo Nordisk


• European Commission
partners and funders from across the globe. • National Science Foundation
Alongside funding from EMBL member 1 Luxembourg • Engineering and Physical Sciences
• Sarcoma Foundation of America
states, in 2021 we received €39.5m in
88
Research Council
research and service grant funding from • Wellcome
UK • Economic and Social Research Council
23 funders (page 41) providing support for
186 projects. 54
Belgium
(European
Commission)

41
USA
1 Denmark

Number of grants received by EMBL-EBI


in 2021 by funding body location.

40 41
EMBL-EBI HIGHLIGHTS 2021 EMBL-EBI LEADERSHIP

EMBL-EBI Directors’ Office Administration & Operations

leadership
Director Director Rachel Curran
Rolf Apweiler Ewan Birney

Associate Directors of Services

Johanna McEntyre Paul Flicek

Head of Research
Research Facilities, Finance Grants Human Operational and
Management Health & Safety John Barron Emma Sinha Resources Capital Projects
John Marioni
Office Andrew Cornell Sihem Bennour Mary Barlow

Protein Systems & Genes, Proteins & Molecular &


Functional Molecular Molecular Molecular Chemistry Literature
Genomes Structures & Mathematical Genomes Protein Cellular
Genomics Archives Atlas Systems Services Services
Chemistry Biology & Variation Families Structure

Samples, Molecular
Vertebrate Functional Protein Sequence Molecular & Chemical Literature
Isidro Phenotypes Networks
Ewan Birney Alvis Brazma Alex Bateman Genomics Genomics Resources Cellular Structure Biology Services
Cortes Ciriano & Ontologies Henning
Paul Flicek Alvis Brazma Alex Bateman Gerard Kleywegt Andrew Leach Melissa Harrison
Helen Parkinson Hermjakob
Gene Protein Databank
European Non-vertebrate Sequence Metabolomics
Pedro Beltrao Expression in Europe
Paul Flicek Rob Finn Moritz Gerstung Nucleotide Archive Genomics Families Claire O’Donovan
(Outgoing) Irene Sameer Velankar
Guy Cochrane Sarah Dyer Rob Finn
Papatheodorou
Functional Cellular Structure
EGA & Archive Variation Protein Function and 3D Bioimaging
Evangelia Genomics
Nick Goldman John Marioni Andrew Leach Infrastructure Annotation Development Ardan Patwardhan
Petsalaki Development
Thomas Keane Fiona Cunningham Maria-Jesus Martin
Ugis Sarkans
Archival
Eukaryotic Proteomics Protein Function
Irene Infrastructure
Zamin Iqbal Janet Thornton Virginie Uhlmann Annotation Juan Antonio Content
Papatheodorou and Technology
Kevin Howe Vizcaíno Sandra Orchard
Tony Burdett
KEY: RESEARCH GROUPS
Genomics BioImage Archive
Phenomics Technology Matthew Hartley SERVICE TEAMS
Thomas Keane
Tudor Groza Infrastructure
Andy Yates TECHNICAL SERVICES

Genome
Analysis
Peter Harrison

Technical Services

Technology & Systems Web Web Systems Software


Training External Relations Communications Industry Partnerships Open Targets Science Applications Production Development Infrastructure Development
Cath Brooksbank Lindsey Crosswell Gemma Wood Andrew Leach Ian Dunham Integration Andy Cafferkey Rodrigo Lopez Geetika Malhotra Tim Dyce & Operations
Steven Newhouse Sarah Butcher

42 43
EMBL-EBI HIGHLIGHTS 2021

Our governance
EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL), an
intergovernmental organisation with 27 member states, one associate member state
and two prospect member states. EMBL is led by a Director General, Edith Heard,
appointed by the EMBL Council.

The EMBL Council is composed of In 2021, EMBL-EBI was led by joint


representatives from all member states of Directors Rolf Apweiler and Ewan Birney,
the Laboratory and determines its policy with the support of two Associate Directors
in scientific, technical and administrative of Services, Paul Flicek and Johanna
matters by giving guidelines to the Director McEntyre, the Head of Research, John
General. The Council ensures that the Marioni, and the Head of Administration and
financial requirements of the agreement Operations, Rachel Curran.
establishing EMBL, and of the agreements Imprint
with host member states are complied with.
Publisher
EMBL-EBI

Portrait image credits


p. 35 (Ania Niewielska portrait): Maciej Grzywocz

All portraits not listed here are from personal collections or


taken on behalf of EMBL by Jeff Dowling and Mary Todd Bergman.

Other image credits


p. 8 (AlphaFold protein) and p. 12: DeepMind
p. 8 (virus and PGS Catalog): Spencer Philips/EMBL
p. 8 (CRyPTIC project): Karen Arnott/EMBL
p. 9 (bacteria): Spencer Philips/EMBL
p. 9 and p. 16 (REMBI): BioImage Archive
p. 9 and p. 25 (CABANA project): Jeff Dowling/EMBL
p. 9 (training participants): Phil Mynott
p. 9 and 34 (servers): Google
p. 32 (composition for COP26): Kristina Arató/EMBL

Printer
Kingfisher Press. www.kingfisher-press.com

44
European Bioinformatics Institute (EMBL-EBI)
Wellcome Genome Campus
Hinxton, Cambridge, CB10 1SD
United Kingdom

y www.ebi.ac.uk
C +44 (0)1223 494 444
E comms@ebi.ac.uk

T @emblebi
F /EMBLEBI
Y /EBImedia
L /company/ebi/

EMBL-EBI is a part of the European Molecular Biology Laboratory.

A digital version of this publication is available on


www.ebi.ac.uk/about/our-impact

EMBL member states and associate member states: Austria,


Belgium, Croatia, Czech Republic, Denmark, Finland, France,
Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy,
Lithuania, Luxembourg, Malta, Montenegro, Netherlands, Norway,
Poland, Portugal, Slovakia, Spain, Sweden, Switzerland,
United Kingdom, Australia

Prospect member states: Estonia, Latvia

You might also like