Read without ads and support Scribd by becoming a Scribd Premium Reader.
OPT-3GNO. Functional genomics and bioinformatics
Semester 2
Functional genomics and bioinformatics - 3GNO Lecture Notes
Dr. Simon Hubbard (H13) x. 8930, email: Simon.Hubbard@umist.ac.uk
Learning outcomes:

In these lectures, we will build on the 2GEN lectures and previous lectures in the 3GNO lecture series to examine some applications of bioinformatics to functional genomics. At the end of the lectures you should:

\u2022Be able to describe how bioinformatics is used as a tool to functionally annotate genomes
\u2022 Be able to discuss issues in assigning biological function
\u2022 Recognise novel bioinformatics techniques currently being developed for comparative and

functional genomics to predict gene function.
\u2022 Be able to discuss how bioinformatics approaches can identify targets of clinical interest
\u2022 Be able to describe how bioinformatics is used to analyse gene expression data to predict
protein function and in clinical diagnostics.
\u2022 Be able to recognise the utility of bioinformatics applications in proteomics data analysis for
functional genomics
Bioinformatics

In second year lectures, we considered how sequence similarity searching (with programs such as BLAST and FASTA) can find homologous genes and proteins, based on common patterns in their primary sequences. The function of the unknown gene or protein sequence is inferred from the function of the database match based on the hypothesis that the proteins share aco mmo n

ancestor from which they have both evolved. These methods typically begin to fail around the

"twilight zone" of c.25% pairwise sequence identity beyond which no statistical significance can be assigned to a match, even when the proteins do share a common ancestral gene. However, many genes in the yeast genome where seen to fall into this category - roughly 2000 so-called

"orphan" open reading frames - with no orthologues in the databases with functional
annotation. New methods are beginning to appear to extend functional classification of genome
sequences beyond these traditional bioinformatics attempts.

A schematic of the process involved in annotating a genome sequence is shown above. We hope we can find many orthologous genes from other genomes whose function is well characterised.

Terminology:Hom ol ogousproteins share a common ancestral gene from which they have evolved.
Orthologous sequences are homologous sequences, but are now in different organism's genomes.
Paralogous sequences are homologues which share sequence and functional similarity but have most
likely arisen from a gene duplication \u2013 they may still be in the same genome.
zGenome assembly, gene hunting, annotation
DNA
sequencers
DNA
sequencers
Raw sequence
Assembly
Assembly
contigs
contigs
Find genes
and ORFs
Find genes
and ORFs
Homologue?
Orphan
No
Yes
Functional
information?
Orthologue
of known fn.
Homologue
Annotation
Yes
No
cosmids
cosmids
OPT-3GNO. Functional genomics and bioinformatics
Semester 2
PSI-BLAST
Position-Specific Iterated BLAST. A more sensitive method1,2. Finds more distant homologues
than standard BLAST, hence supports more annotations to be made.
Pros
Cons
Sensitivity
False positives
Get automatic alignments
Errors are propagated
Annotation using protein structures
SUPERFAMILY resource (http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/)
HMMs build for every structure in SCOP database at superfamily level
Applied to sequenced genomes shows that (40-50% of genomes match to known structures)
Most proteins are multi-domain

Unknown
(new) query
sequence

Unknown
(new) query
sequence

BLAST against
database
BLAST against
database

Align all \u201chits\u201d
with score better
than threshold

Align all \u201chits\u201d
with score better
than threshold

Build profile
Build profile

BLAST profile
against
database

BLAST profile
against
database

If no new hits,
STOP
p
y
PSI-BLAST (1)
PSI-BLAST (2)
\u201cUnknown\u201d query sequence
false positives
real biological matches
False
positives
OPT-3GNO. Functional genomics and bioinformatics
Semester 2
Real genome annotation
What have we learnt from the human genome ?
Plasmodium genome: lots of new drug targets identified.
Errors in Genome annotation
Can be assessed by comparing assignments of EC numbers3, or different groups annotating
different genomes (Steve Brenner group)
Human Genome,ENSEMBL www.ensembl.org
See ref. Nature 409, 861-921 (p.896 onwards)
eukaryote and
prokaryote
21%
prokayoyes only
0%
vertebrate only
22%
vertebrate and other
animals
24%
no aminal homology
1%
animals and other
eukaryotes
32%
74% of ~30000 genes
BLASTP &
PSI-BLAST
See Nature, 415, p.702-
14 x-somes, c.5000 genes
Search History:
Searching...
Result 00 of 00
00 results for result for
  • p.
  • Notes
    Load more