You are on page 1of 12

Research Questions:

Why is there a need for a biomedical knowledge graph like SPOKE?: Start by explaining that the
biomedical field generates vast amounts of data. This data includes information about genes, proteins,
diseases, drugs, and more. However, these data are often scattered across various databases and
repositories, leading to a fragmented and compartmentalized landscape. SPOKE was developed to address
this issue by creating a unified knowledge graph that connects this scattered information.
How is the complexity, size, and heterogeneity of biomedical information a challenge?: Describe the
challenges posed by the sheer volume of data and the diversity of data types (e.g., genetic, clinical,
pharmacological). Emphasize that managing and integrating such diverse and complex data is a formidable
task.
Why is connecting seemingly disparate information essential for precision medicine efforts?: Explain
that precision medicine aims to tailor medical treatments to individual patients. To achieve this, it's crucial to
connect and cross-reference a wide range of information, from a patient's genetic makeup to the latest drug
research. SPOKE provides the infrastructure to make these connections, which can lead to more informed
and personalized healthcare decisions.
Scalable precision medicine open knowledge
engine (SPOKE)

A biomedical knowledge SPOKE contains millions of


graph connecting various nodes and edges from 41
concepts via meaningful databases, structured
relationships using 11 ontologies.

The graph is built weekly SPOKE aims to integrate


using Python scripts and disparate biomedical
offers a REST API for information to support
querying. precision medicine efforts.
Why should I care? • Advancing Knowledge: Reading this paper can contribute to
your understanding of how knowledge graphs are used to
integrate and connect diverse sources of information. This
knowledge is essential in a data-driven world.
• Biomedical Insights: The paper discusses the application of
knowledge graphs in the biomedical field.
• Innovative Approaches: It introduces unique methods for
creating knowledge graphs, which can be valuable if you're
interested in data analytics, machine learning, or predictive
modeling.
• Complex Data Handling: Understanding how SPOKE handles
the complexity of biomedical data can be beneficial if you work
with diverse data sources or intend to develop your knowledge
graph.
• Collaborative Projects: The paper mentions SPOKE's role in the
Biomedical Data Translator project, showcasing the importance
of collaborative initiatives in the scientific community.
• Transparent Models: The mention of "explainable models" in
the paper may interest those concerned with transparency and
interpretability in data-driven decision-making.
Methodology
Data Integration: SPOKE brings together data from a diverse set of 41 different sources,
allowing the integration of information from various biomedical databases.

Continuous Updates: The construction process is ongoing, with weekly updates, ensuring
that SPOKE stays current with the latest information from the source databases.

An overview of Organism Identification: It identifies organisms using the NCBI Taxonomy ID and determines
species of interest from multiple sources, which is essential for linking biological data. For
how SPOKE is example, Escherichia coli and Bacillus subtilis are identified by their unique Taxonomy IDs,
allowing researchers to study specific bacterial species.
built Protein Information: SPOKE includes protein data from UniProt, which is a well-known
protein database. This means it can provide details about proteins found in various
organisms. For instance, it might offer information about the structure, function, and known
interactions of a particular protein.

Protein Interactions: The graph incorporates information about protein interactions from
sources like STRING and IntAct, which helps to establish relationships between proteins.
Consider two proteins, A and B. SPOKE might contain information from sources like STRING
and IntAct that indicate that protein A interacts with protein B. This helps researchers
understand how different proteins work together in biological processes.
Genes and Diseases: SPOKE integrates data from NCBI
Gene to provide information about human genes. For
instance, it might link a specific gene to diseases it is
associated with. This allows researchers to study the
genetic basis of diseases and understand which genes
An are involved in particular health conditions.
Compound Information: SPOKE includes data about
overview of compounds from sources like ChEMBL, DrugBank, and
the Connectivity Map project, which is crucial for

how SPOKE understanding drug interactions and pharmacology.

is built Augmentation: SPOKE goes beyond these core data


sources by integrating additional databases to
enhance its utility and provide a more comprehensive
view of biomedical knowledge.
Pathways:

• Imports human pathway information from sources like WikiPathways and


PathwayCommons.
• Introduces a "Pathway" node type connected to genes through "Gene-

An participates-Pathway" edges.

Metabolic Pathways:

overview • Reads data from resources such as KEGG, MetaCyc, and PATRIC to incorporate
metabolic pathways.
• Introduces a "Reaction" node that links to metabolites through "Reaction-

of how consumes-Compound" and "Reaction-produces-Compound" edges.


• Adds an "EC" (Enzyme Commission) node that links to reactions via "EC-
catalyzes-Reaction" edges and further links to proteins through "Protein-has-

SPOKE is EC" edges.

Food Data:

built • Integrates information from two food databases, FooDB and the Australian
Food Composition Database.
• Establishes relationships with compounds and nutrients in the knowledge
graph through "Food-contains-Compound" and "Food-contains-Nutrient"
edges.
• Aims to incorporate the FoodOn ontology to standardize food data mapping.
API Access: The REST API provides a means
for users to access and query the nodes and
edges within SPOKE, allowing for data
An exploration and retrieval.
overview
of how Support for Various Use Cases: The API
SPOKE is offers different types of queries, including
meta-information, information about
built specific nodes, and network-related queries.
This flexibility supports a wide range of use
cases in biomedical research and analysis.
The SPOKE REST API
• is like a bridge that allows you to access and interact with the SPOKE
knowledge graph, which contains a lot of information about biology and
medicine. It was mainly designed to work with a tool called the
Neighborhood Explorer. This API has three main parts:
1.Meta-Information: You can ask for general information about the
knowledge graph, kind of like checking a map to see what's in it.
2.Node Information: You can search for specific things in the graph, like
finding all the proteins related to a certain disease. This is a bit like
searching for specific places on a map.
3.Network Information: This part is a bit more complex, but it allows you to
explore how different things in the graph are connected. It's like looking at
a subway map and figuring out how to get from one place to another.
The main differentiation factors between a REST API
and a SPARQL API
Factor REST API Hypothetical SPARQL API

Purpose Accessing data from a database Querying linked data (knowledge graph)

Access Method HTTP methods (GET, POST, etc.) SPARQL query language

Query Flexibility Limited to predefined endpoints Expressive, supports complex queries

Documentation Detailed endpoint documentation Documentation for SPARQL language

Query Complexity Limited complexity in queries Highly complex, graph pattern matching

REST API can work with RDF (Resource


Description Framework) data, although it
a SPARQL API, which is specifically
Working with RDF data might not provide the same level of
designed for RDF data.
semantic querying and data manipulation
capabilities.
The End

You might also like