You are on page 1of 38

Gene ontology and pathways

Ståle Nygård
stale.nygard@medisin.uio.no

Bioinformatics Core Facility,


Oslo University Hospital/University of Oslo
So: here you are
Gene lists

• Long list of
differentially
expressed genes
• Possibly hundreds
of papers
describing the
functions of the
genes
• Misleading names
• Different names in
different organisms
Genes seldomly operate on it's
own
-Genes are by nature not independent.
Biologically related genes will often show
expression changes together

-Trends supported by several genes in a


group gives more power to statistical tests
vs a test for an individual gene

-Need predefined groups of biologically


related genes to help process our list for
systematic changes.
Ontologies
• Gene Ontology (GO)
• Sequence Ontology (SO) (sequence
features)
• Phenotype and Trait Ontology (PATO)
• Taxon (NCBI)
• Anatomy (Penn)
• Disease (ICD9)
• Developmental stage (multiple sources)
Gene Ontology (GO)
• Why Gene Ontology?
– Produce a controlled vocabulary describing
aspects of molecular biology, that can be
applied to all organisms.
– Facilitate communication between people and
organization.
– Improve interoperability between systems.
Goal of GO Consortium
(http://www.geneontology.org/)

• Produce a controlled vocabulary


describing aspects of molecular biology,
that could be applied to all organism.
• Describe gene products using vocabulary
terms (annotation).
• Develop tools:
– to query and modify the vocabularies and
annotations
How does GO work?

What information might we want to capture


about a gene product?

• What does the gene product do?


• Why does it perform these activities?
• Where does it act?
The Gene Ontology (GO)
– Molecular function:
• Gene product at biochemical level.

– Biological process:
• Cellular events to which the gene product
contributes.

– Cellular component:
• Location or complex of gene/protein.
Molecular Function
• activities or “jobs” of a gene product

Insulin binding
Insulin transport activity
Biological Process
• a commonly recognized series of events

cell division
Cellular Component
• where a gene product acts
Content of GO

Molecular Function 8,731 terms


Biological Process 19,022 terms
Cellular Component 2,737 terms

Total 30,490 terms

Obsolete terms: 1434

As of May 2010
GO Annotation
• Association between gene product and
applicable GO terms
• Provided by member databases. Collaborating
databases annotate their gene products (or genes)
with GO terms, providing references and indicating
what kind of evidence is available to support the
annotations.
• Made by manual or automated methods.
• GO Annotation
• Database object: gene or gene product
• GO term ID
• Evidence supporting annotation
• Reference
– publication or computational method
Overrepresentation of GO terms
• We have a subset of genes
– List of differentially expressed genes
– List of genes that cluster together

• Which biological processes do these


genes take part in?

• Is there an over-representation of the


number of genes belonging to a particular
biological process, compared to what
could be expected?
Gene Ontology Tools
• eGON (from NTNU, www.genetools.no)
• GSEA
• DAVID
• EASE
• TopGO
• GOstat
• + many more
Question:
which cellular biological processes occur?

0 2 4 6 8 10 12 14 16 18 20 22 24 hours

human fibroblasts -
24 h time course thymidine-block
release
Questions

what is the function of up-regulated genes?

0 2 4 6 8 10 12 14 16 18 20 22 24 hours

what is the function of down-regulated genes?

human fibroblasts -
24 h time course thymidine-block
release
173 genes up-regulated 0-4 hours compared to all
genes on the array

Ordered by significance:
146 genes down-regulated 0-4 hours compared to
all genes on the array
homeostasis

cell adhesion lipid transport


chemotaxis
amino acid metabolism
lipid metabolism response to stress

0 2 4 6 8 10 12 14 16 18 20 22 24 hours

cell signaling S-phase


ion transport apoptosis
cell cycle arrest
apoptosis

human fibroblasts -
24 h time course thymidine-block release
Biological pathways
Type of pathways
• Metabolic pathways
– convert raw materials from the environment
into value-added products and recycle or
dispose of intracellular materials
• Signaling pathways
– convert mechanical/chemical stimulus to a cell
into a specific cellular response
• Regulatory pathways
– alter the output of the genetic program
through transcriptional and translational
regulation
• Signaling,
regulatory
and
Signaling
metabolic
events are
often linked Regulatory

Metabolic
Types of pathway
representations
• Cartoons
– Textbooks
– Biocarta
• Circuit diagrams
– KEGG
– Reactome
– geneRifs
• Computational networks
– SBML models
– Transcription factor
networks
KEGG
• A large collection of signaling, metabolic
and regulatory pathways
• Organised by separate pathways with
hand drawn diagrams
• Academic (freely available)
• The pathways can be used to look for
overrepresentation or enrichment
• Can be used to visually check for path-
ness or direction
TGF Beta signalling patway
Same pathway in Biocarta
GO vs. Pathways
• Overview • Detail view
• Can handle a large • Focused sets of
number of genes genes
• Many genes • Scattered data
annotated sources
• Every gene • Focuses on
considered on its own interactions between
genes
Network construction
• Information about established pathways
(e.g. in KEGG) is (not at all) complete
• Pathways interact and depend on context
• An alternative approach to using
established pathways is to construct
networks from the data.
Network construction
• Networks can be inferred inferred from
– correlation in the data (recall gene clustering)
and/or
– interaction databases:
• Protein-protein interactions: BioGRID,
IntACT, DIP,HPRD ++
• Transcription factor data bases:
TRANSFAC, JASPAR ++
• Literature: PubGENE
Network construction: case
study

WT AB CXCR5 KO AB

Mice with the chemokine CXCR5 receptor


knocked out develop dialated hypertrophy
after banding of the aorta.
Microarray study

WT SHAM KO SHAM WT AB KO AB
(n=3) (n=3) (n=4) (n=4)

Aim of study: Find the molecular mecanism


behind the altered phenotype of the heart.
Network construction using prior
knowledge

This method constructs a network of


interacting genes based on literature
reported interactions, protein-protein
interactions and correlations in the data.
Results
FMOD - fibromodulin
…may regulate TGF-beta
activities by sequestering TGF-
beta into the extracellular matrix CXCL13
B lymphocyte Thbs1- thrombospondin 1
Fn1-Fibronectin 1 chemoattractant Adhesive glycoprotein that
Extracellular matrix mediates cell-to-cell and
glycoprotein that cell-to-matrix interactions.
Tgfb2 - transforming
binds to membrane
growth factor, beta 2
-spanning
Extracellular
receptor proteins
glycosylated protein.
called integrins.
Spp1- secreted
phosphoprotein 1
Cytokine. Probably
Thbs4- important to cell-
thrombospondin matrix interaction
4
Lox – lysil oxidase
Col14a1-
Extracellular copper
Collagen, type XIV,
enzyme that initiates
alpha 1
the crosslinking of
collagens and elastin. KO AB vs KO SHAM

The method finds a cluster of differentially expressed


extracellular matrix locallized genes
Conclusion
• GO is the world map of molecular biology

• Pathways provide more detailed


information

• Network construction using interaction


databases can reveal information beyond
classical pathways
Questions?

You might also like