Professional Documents
Culture Documents
Authors
ELIXIR
Dr Katharina B Lauer
Andrew Smith
Dr Niklas Blomberg
Xènia Pérez Sitjà
Callum Talbot-Cooper
Contents
Abstract6
Spotlight on talent 23
Abel Ureta-Vidal
Maria Chatzou Dunford
About ELIXIR 27
Methods 28
6
Abstract
‘In the long history of Openness in the life sciences
humankind (and animal facilitates collaboration
kind, too) those who learned and increases innovation
to collaborate and improvise potential for industry
most effectively have Undoubtedly, at one point in their career, every
prevailed.’ scientist will come across the work of Charles
Darwin and his groundbreaking discoveries
and theories. Already in 1859, he argued that
Charles Darwin collaborative use and reuse of existing resources
The Origin of Species by Means of Natural Selection, or the enable novel outcomes. His work has been regularly
Preservation of Favoured Races in the Struggle for Life, 1859 used to explain success through innovation and
entrepreneurship.
“
Open data enable innovation
in the bioinformatics industry
Research data is a major asset for life sciences
industry2,3,4. Exploiting these data drives
development within health, agriculture and
biotechnology sectors, thereby leading to economic
and societal benefits. For example, to accelerate
drug development, produce efficient washing ‘The more freely available data that is
powder, and even design better feed for livestock compatible with other data, the better for us.
and pets. Without open data resources, large companies,
such as pharmaceuticals, would be hampered
Such innovative life sciences projects require the in their work but would not cease to exist. In
essential use of open data, open-source software contrast, small and medium-sized enterprises,
and common standards. Together, open resources as well as the startups, would be severely
allow industry, academia and the wider community impacted as in many cases their business models
to create novel services and products. are dependent on the availability of open data
resources.’
For today’s digital life sciences, data standards
and data quality are major bottlenecks for data
accessibility and data sustainability, thereby Hans Garritzen
essential enablers of effective data utilisation. Sales Director, Medisapiens
Research infrastructures, such as ELIXIR, ensure
that life science innovators and researchers globally ‘I think Europe, by investing in that framework
can utilise open data. They facilitate access to open [ELIXIR], is doing the right thing to support the
resources such as databases, repositories, software long term sustainability of the Life Sciences
tools, registries, compute and cloud resources, while knowledge economy. Imagine if all the data
promoting standards and enabling collaboration resources that are freely available today would
through training and networking events5. disappear from tomorrow. It would be tough
for the life sciences companies to sustain their
This report sheds light on the importance of
innovation. They would have to work really,
continued access to open data and related
really hard and spend a large portion of their
resources, along with their underpinning
Research and Development (R&D) spending on
infrastructures that contribute to a thriving life
replacing that common good.’
science ecosystem.
Florent Villiers
Lab Leader for Biochemistry & Data Science, Bayer
Physical infrastructure
e.g. transport, accommodation/offices, communications
6 Constantinides, P., Henfridsson, O. & Parker, G. G. Platforms and infrastructures in the digital age. Inf. Syst. Res. 29, 381–400 (2018).
7 Adner, R. & Kapoor, R. Value creation in innovation ecosystems: How the structure of technological interdependence affects firm performance in new technology
generations. Strateg. Manag. J. 31, 306–333 (2010).
8 Russell, M. G. & Smorodinskaya, N. V. Leveraging complexity for ecosystemic innovation. Technol. Forecast. Soc. Change 136, 114–131 (2018).
9 Russell, M. G., Huhtamäki, J., Still, K., Rubens, N. & Basole, R. C. Relational capital for shared vision in innovation ecosystems. Triple Helix 2, 1–36 (2015).
9
0 20 40 60 80 100
Figure 1. Number of bioinformatics-related start-ups and SMEs identified per country within
Europe. Orange hexagons represent regions with high concentrations of companies (countries
shown where data was available).
year, with over 75% having a science or technical companies face a talent-acquisition challenge.
degree – people and expertise matter. They need to employ experts in specialised life
science areas and, at the same time, bring in
Take Arcticzymes, located in Tromsø10,11. experts in cutting-edge technologies, such as
Arcticzymes extensively uses public databases machine learning and artificial intelligence.
to identify novel cold-adapted enzymes from Access to people, personal networks, partners
arctic organisms, and formulates these for use and collaborators that provide open resources
in molecular research, in vitro diagnostics and remains an important factor for the concentration
therapeutics. Companies that base their business of bioinformatics ventures in certain regions
model on digital technologies and data can be (Figure 1).
based anywhere – but our research has shown that
being close to an innovation ecosystem helps many In this report, we will explore the innovation
to thrive and go further. Because of the knowledge- ecosystems in Europe that harbour most
intensive nature of the industry, bioinformatics bioinformatics SMEs.
Cambridge
12
13 Regional Innovation Scoreboard, Berlin – Internal Market, Industry, Entrepreneurship and SMEs – European Commission. https://ec.europa.eu/growth/tools-databases/
regional-innovation-monitor/base-profile/berlin (visited in 2019).
14 Startup Genome – Berlin. https://startupgenome.com/ecosystems/berlin.
15 Life Sciences Report 2019/2020 – Berlin Partner for Business and Technology https://www.berlin-partner.de/fileadmin/user_upload/01_chefredaktion/02_pdf/
publikationen/Life-Sciences-Report_en.pdf (2020).
Berlin
13
Barcelona
In 2019, Catalonia had about 7.5M residents, Among the many bioinformatics ventures that
making it the second-most populous region have launched in Barcelona are Seqera Labs and
in Spain, accounting for 16.1% of the total Minoryx. Seqera Labs provides an open-source
population16. At that time, Catalonia reached a and cloud-based high-performance computing
GDP of €228.68 billion. solution that helps researchers analyse large
biomedical datasets and manage their analysis
The Barcelona ecosystem represents a dense workflows. Minoryx focuses on the discovery and
and industrial community of SMEs with an active development of novel therapies for severe, orphan
presence of large multinationals, especially genetic diseases of the central nervous system,
in the biomedical, agro-food, automotive and a group of rare diseases often characterised by
telecommunications sectors. Barcelona is a neurodegeneration and no or few treatments
growing ecosystem for new ventures, with available.
particular strengths in gaming and life sciences,
the majority of which in the digital health domain17.
“
universities and more than 40 research institutes,
with at least 15 focusing on the life sciences,
provide scientific research and training. According
to the ‘Regional Innovation Scoreboard 2019’,
Catalonia is classified as a ‘Moderate+ Innovator’
that exceeds innovation performance compared
with the rest of Spain and is continuously growing.
Ventures start and grow in more than 14 science
parks.
16 Region Innovation Scoreboard, Catalonia – Internal Market, Industry, Entrepreneurship and SMEs – European Commission. https://ec.europa.eu/growth/tools-
databases/regional-innovation-monitor/base-profile/catalonia (visited in 2019).
17 Startup Genome – Barcelona. https://startupgenome.com/ecosystems/barcelona
Mobility of science
entrepreneurs for
bioinformatics
venture creation
Building an efficient, highly talented founder team is a
critical factor for forming successful ventures.
Figure 2. Some countries in Europe excel in being ‘sticky’. These countries show moderate and neutral entrepreneur
export ratios – close to one –, therefore excelling at retaining talent. (only shown countries where data were available)
16
76%
stated that without data shared on open
Creating
repositories, they would not be able to offer
their product or service.
Organism
Protein
Diagnostic
Genomics tests
Data
Public data
New drugs
Transcript and vaccines
Publications Cosmetics
Patents
Animal food
Variants
Active Washing
substances powders
Business internal
appropriation strategies
Medical
using expert knowledge
devices
or technologies
Proprietary data
such as AI/ML
Technology
Data & solutions
other
resources
Integration and
Value chain for
data analysis
Selection processing Transformation Mining Interpretation
Vocabulary Generation
Configurable Model
Care Service
Workflow as a Service
Figure 4. Appropriation strategies of companies along the data analysis value chain
These openly available resources are mostly Depending on what stage(s) a company operates
combined with proprietary resources created along the value chain, it can employ different
by bioinformatics ventures themselves or their appropriation strategies18. A firm can either
clients in the private sector. Outputs of those focus on one or more core activities or include
companies can vary considerably and range from the whole value chain in its business strategy.
drug discovery or diagnostic tests, to shaping new Particularly at early stages, our survey has shown
technology solutions or even domestic products that bioinformatics ventures tend to provide a full-
such as washing powders. service model for clients on a custom and hardly
scalable basis. The majority of mature ventures
With more open and proprietary life science data rely on technology to either provide a product at a
resources available, the bioinformatics industry particular stage, for instance, to provide access to
has brought forth companies on multiple stages of curated repositories of open and proprietary data,
the value chain. Figure 4 provides an overview of or generate tools for creating ontologies that will
the value chain of businesses making use of open integrate new data resources and answer specific
life science resources: research questions.
1. Collecting, selecting, and providing data To provide a scalable product along the entire
resources. value chain, some companies created platforms
and private infrastructures that help customise
2. Integrating data and pre-processing these
otherwise automated workflows to process raw
resources for later use either by the company
data into visualisations and insights.
itself or its customers.
18 Rothe, H., Jarvenpaa, S. & Penninger, A. How do entrepreneurial firms appropriate value in bio data infrastructures: an exploratory qualitative study. Res. Pap. (2019).
18
Core activities Using this approach, Scailyte relevant single-cell data from
can identify molecular profiles the open domain for their
Founded in 2017, Swiss
(biomarkers) from specific analysis.
company Scailyte uses
cell types associated with a
single-cell data and artificial Open data resources, including
disease. These biomarkers
intelligence (AI) to identify annotations, genomes and
are of value for developing
novel disease biomarkers. transcriptomes, play an
early diagnostic tools and may
Unlike methods that capture important role in providing a
represent potential therapeutic
average measurements of bulk baseline for single-cell data
targets.
populations of cells, single- processing. During this data
cell technologies generate The company has raised about processing stage, Scailyte uses
data at the level of individual €5.5M. The venture relies on several community-developed
cells, offering unparalleled highly specialised knowledge tools.
insight into biological diversity to make effective use of data.
between cells. Single-cell data Of 17 team members, 65%
contains biomedically relevant hold a PhD in the life sciences, Position along the value
information not captured by and a third have a computer chain
other technologies but requires science or bioinformatics Scailyte operates across the
innovative computational background. entire length of the value chain
approaches to extract this
– performing data selection,
information.
pre-processing and analysis,
How do they use open and data interpretation to
At Scailyte, single-cell data is data?
fed into their proprietary AI identify profitable novel
developed pipeline (ScaiVision) While collecting proprietary insights. For Scailyte, custom
to associate patterns in single- data themselves and with data inputs are dependent on
cell data with disease status. research groups, Scailyte takes the customer.
Integration and
Value chain for
data analysis
Vocabulary Generation
Configurable Model
Care Service
Workflow as a Service
19
Core activities such as big data analysis and has been performed for 120
enterprise-wide search. publicly available ontologies. In
Founded in 2012, UK-company
addition, SciBite works closely
SciBite specialises in ontology Elsevier acquired the company with ontology communities
management and text analytics. in 2020 for over €70M. Similar to feedback issues they may
Their software tackles the to Scailyte, the venture relies on discover as part of their effort to
challenge to extract, analyse highly specialised knowledge. improve these public resources
and integrate, knowledge From 59 team members, the continually.
locked within unstructured company equally employs life
biomedical text to release its scientists (31%) or computer
full potential. scientists and bioinformaticians Position along the value
(38%). chain
Using an ontology-led approach
powered by machine learning, Using their specialised domain
their software can quickly knowledge, SciBite operates
How do they use open
extract scientific terminology at a specific stage along the
data?
from unstructured text and value chain – data integration,
transform it into a rich, Publicly available ontologies pre-processing and data
semantically annotated form one of the key pillars of transformation. Their suite of
machine-readable dataset. SciBite’s software solutions. APIs can be picked up by large
Ready-made application SciBite enriches these corporations and integrated
programming interfaces ontologies through manual into existing infrastructures.
(API) integration of data into curation and machine learning
private infrastructures. The to make them more applicable
resulting data can be used to industry-specific challenges.
for downstream processes, This enrichment process
Integration and
Value chain for
data analysis
Vocabulary Generation
Configurable Model
Care Service
Workflow as a Service
20
Spotlight on
open data
Open data resources are resource stands out as most
essential for today’s life science important). Figure 6 shows how
community in both academia and ventures combine data from
industry. They act as archives of different databases.
experimental research data, inform
future research studies’ direction, They combine open data
and help link and bring together resources, often with proprietary
a vast wealth of interoperable information, to create value for
information. customers. In doing so, ventures
combine open data resources,
Many new databases are essential often with proprietary information.
to the broad research community, To enable new insights to be
but only a few have a sustained garnered, or to facilitate a diverse no growth
funding model, as highlighted in a range of customers to gain new growth of turnover and
recent study. In an 18-year period, knowledge, an assortment of emplyees
more than 60% of life science data needs to be connected and
research databases ceased to exist, analysed. We, therefore, found that
and a further 14% were no longer companies that grew more rapidly
updated19. were among those who used many
different resources in combination 40
Research infrastructures such (Figure 5).
as ELIXIR work to increase the
long-term sustainability of their
resources and carry out targeted
actions so that these feed into 30
Interestingly all companies in the Figure 5. Percentage of companies reporting either growth of turnover and number
survey indicatezzwas relevant of employees or no change, related to the number of open data repositories used by
a company.
to the business (no individual
19 zAttwood, T. K., Agit, B. & Ellis, L. B. M. Longevity of Biological Databases. EMBnet.journal 21, (2015).
21
Spotlight on talent
Who are the founders?
It is indisputable that education, knowledge and skills translate into
economic value. Human capital has long been regarded as one of the
driving forces of innovation and creating entrepreneurial actions. A
deep understanding of science and technology is needed to develop
a bioinformatics venture. For that reason, it is vital to understand
where talent in Europe is forged. Universities and academic institutes
increasingly equip academics with entrepreneurial competencies.
Abel
Ureta-Vidal
1999 I was very fortunate to join the Human Genome Project straight
University in after my PhD. I was put in charge of setting up all the data
France and PhD
at Universite de
analysis to predict genes on the sequence of chromosome 14.
Paris
1999-2001 For a French man, it was a dream
Genoscope job, i.e. a job for life in an academic
– French
contribution
institution, but I wanted a bit more
to the human international exposure and came to
genome project Cambridge and joined the Ensembl
2001-2007 project at EMBL-EBI.
EMBL-EBI –
Project leader
at Ensembl
(Cambridge)
Maria
Chatzou Dunford
Born in Greece I am originally from Greece, but I have spent two-thirds of my life
abroad. Every scientist has a similar trajectory – you move for your
work.
2016-2017
Postdoctoral
researcher at CRG
2017 I did move to Cambridge when I founded Lifebit. When you are trying
Founder of Lifebit to build a deep-tech company, then you need to be very close to the
(Cambridge)
pharmaceutical clients, hospitals, biotech companies because they’re
our predominant clients. Then, you need to be close to a cluster of
big research institutions – take the Golden Triangle between London,
Cambridge and Oxford.
27
About ELIXIR
ELIXIR aims to make it easier for researchers and Funders and policymakers will, in turn,
entrepreneurs in academia and industry to find and acknowledge that providing free bioinformatics
share data, exchange expertise and agree on best resources represents a public good and support
practices to move science forward and accomplish innovation and the transition to a knowledge-based
breakthrough discoveries. economy. Activities from the industry programme
include networking events (e.g. the Innovation
and SME Forum), funding mechanisms for small
scale collaborative projects, or opportunities for
knowledge exchange.
28
Methods
This report has been based on the qualitative-
exploratory field study20 conducted in 2018-2019 to
understand the challenges that digital entrepreneurs
face to appropriate value from bio-data infrastructures.
<250
<100 <25M
>1M <10M
>50 %
increased
substantially
<5M
>75% <1M
<50
<50 %
<500K
Companies [%]
<1M
increased
<250K
<25 % >50%
<250K
<150K steady
>25%
0
same 0 0
<25%
dissmissed
Number of New employees in Employees with Grant funding Private funding Turnover in
employees past 12 month STEM raised raised past 12 month
background
ELIXIR Hub, South Building
Wellcome Genome Campus
Hinxton, Cambridge, CB10 1SD, UK
info@elixir-europe.org