Professional Documents
Culture Documents
Genomics & Bioinformatics
Genomics & Bioinformatics
Doug Brutlag
Professor Emeritus
Biochemistry & Medicine (by courtesy)
Lee Kozar
Maeve OHuallachain
Dan Davison
Course Requirements
Lectures
Theoretical background of current methods
Strengths and weaknesses of current approaches
Future directions for improvements
Demonstrations
Applications (Mac, PC, Unix, Web)
Web applications
Illustrate homework
David Mount
Bioinformatics: Sequence and Genome Analysis 2nd Edition
Jin Xiong
Essential Bioinformatics
Dan Gusfield
Algorithms on Strings, Trees & Sequences
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
Benjamin Lewin
Genes IX
Bioinformatics
Proteomics
Proteomics
Bioinformatics
Structural Genomics
Proteomics
Machine Learning
Artificial Intelligence
Robotics
Databases
Algorithms
Information Theory
Graph Theory
What is Bioinformatics?
Individuals
RNA
Protein
DNA
Phenotype
Evolution
Selection
Populations
Biological Information
DNA
RNA
Protein
Phenotype
(Symptoms)
Molecular
Structure
Biochemical
Function
Phenotype
(Symptoms)
Molecular
Structure
Biochemical
Function
Phenotype
(Symptoms)
Challenges Understanding
Genetic Information
Genetic
Information
Molecular
Structure
Biochemical
Function
Phenotype
Redundancy in Genomic
& Protein Sequences
DNA is double-stranded
Genetic code
Acceptable amino-acid
replacements
Intron-exon variation
Alternative splicing
Strain variations (SNPs)
Sequencing errors
Sequences of Common
Structure or Function
Sequence Similarity
10
20
30
40
50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:||||
| |:||| |: : :|:| :| |
|: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10
20
30
40
50
A Typical Motif:
Zinc Finger DNA Binding Motif
C..C............H....H
2 1 3 13 10 12 67 4 13 9 1 2
7 5 8 9 4 0 1 16 7 0 1 0
0 8 0 1 0 0 0 2 1 1 10 0
0 1 0 1 13 0 0 12 1 0 4 0
0 0 1 0 0 0 0 0 0 2 2 1
1 1 21 8 10 0 0 7 6 0 0 2
2 0 0 9 21 0 0 15 7 3 3 0
9 7 1 4 0 0 8 0 0 0 46 0
4 3 1 1 2 0 0 2 2 0 5 0
10 0 11 1 2 10 0 4 9 3 0 16
16 1 17 0 1 31 0 3 11 24 0 14
3 4 5 10 11 1 1 13 10 0 5 2
7 1 1 0 0 0 0 0 5 7 1 8
4 0 3 0 0 4 0 0 0 10 0 0
0 6 0 1 0 0 0 0 0 0 0 0
1 17 0 8 3 1 3 0 2 2 2 0
5 22 3 11 1 5 0 2 2 2 0 5
2 0 0 0 0 0 0 0 0 1 0 1
1 0 4 2 0 1 0 0 2 4 0 1
6 3 1 1 2 15 0 0 2 12 0 28
Consensus Sequences
or Sequence Motifs
Zinc Finger (C2H2 type)
C x {2,4} C x {12} H x {3,5} H
Profiles, PSI-BLAST
Sequences of Common Hidden Markov Models
Structure or Function
D2
D3
D4
D5
I1
I2
I3
I4
I5
AA1
AA2
AA3
AA4
AA5
Sequence Similarity
10
20
30
40
50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:||||
| |:||| |: : :|:| :| |
|: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10
20
30
40
50
AA6
Buried Treasure
Buried Treasure
Buried Treasure
IPGP
IPDK
IPDG
IPCH
IPCA
IPBO
IPAF
20
30
M A L WM R L L P L L A L L A L W A P A P T R A
M A L W I R S L P L L A L L V F S G P G - T S Y
M A V W I Q A G A L L F L L A V S S V N A N A G
M A A L WL Q S F S L L V L L V V S W P G S Q A V
A . W . .
L L
L L
40
IPGP
IPDK
IPDG
IPCH
IPCA
IPBO
IPAF
L
L
L
L
L
L
L
L
C
C
C
C
C
C
C
C
G
G
G
G
G
G
G
G
S
S
S
S
S
S
S
S
N
H
H
H
H
H
H
H
L
L
L
L
L
L
L
L
V
V
V
V
V
V
V
V
E
E
E
E
D
E
D
E
T
A
A
A
A
A
A
A
IPGP
IPDK
IPDG
IPCH
IPCA
IPBO
IPAF
D
Q
D
Q
L
G
L
P
P
L
P
G
P
G
P
Q
Q
F
Q
F
V
L
V
L
L
V
L
L
E
V
R
V
P
G
P
Q
N
D
S
P
A
P
T
G
V
S
K
L
K
E
P
E
P
S
E
S
L
L
L
L
L
G
L
IPGP
IPDK
IPDG
IPCH
IPCA
IPBO
IPAF
A
E
A
E
V
P
M
L
Y
L
Y
I
P
M
.
Q
Q
Q
E
R
Q
V
Q
X
X
K
K
K
K
K
K
X
X
R
V
R
R
R
R
K
-
R
-
G
G
G
G
G
G
G
G
I
I
I
I
I
I
I
I
L
L
L
L
L
L
L
L
S
L
L
L
L
L
L
L
V
V
V
V
V
V
V
V
C
C
C
C
C
C
C
C
Q
G
G
G
G
G
G
G
D
E
E
E
P
E
D
E
D
R
R
R
T
R
R
R
G
G
G
G
G
G
G
G
F
F
F
F
F
F
F
F
M
G
G
G
G
A
G
G
E
A
E
G
A
A
G
Q
A
E
D
L
V
P
A
T
P
N
G
G
G
G
E
G
E
G
S
N
N
N
N
A
N
R
Q
Q
Q
Q
Q
Q
Q
D
E
E
E
E
E
E
E
Q
Q
Q
Q
Q
Q
Q
Q
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
T
E
T
H
H
A
H
G
N
S
N
K
S
R
T
P
I
T
P
V
P
C
C
C
C
C
C
C
C
T
S
S
S
S
S
N
S
F
F
F
F
F
F
F
F
A
E
E
V
V
A
V
I
S
T
S
N
T
N
.
P
P
P
P
P
P
P
P
K
K
K
K
K
K
K
K
D
T
A
A
R
A
R
.
X
X
R
R
D
R
D
X
X
R
R
V
R
V
E
D
E
D
D
E
D
D
L
V
V
V
P
V
Q
V
G
L
G
L
A
G
A
G
P
G
P
D
G
E
L
F
L
F
F
L
F
F
Q
Q
Q
Q
A
E
A
Q
P
P
F
F
L
K
K
A
D
D
L
L
H
Q
Q
H
E
Q
A
M
H
Y
Y
Y
F
Y
F
Y
Q
Q
Q
Q
E
Q
D
Q
L
L
L
L
L
L
L
L
Q
E
E
E
Q
E
Q
E
S
N
N
N
N
N
N
N
Y
Y
Y
Y
Y
Y
Y
Y
C
C
C
C
C
C
C
C
N
N
N
N
N
N
N
N
E
E
E
E
P
E
L
E
90
110
R
L
L
L
I
L
I
L
H
H
H
H
H
H
H
H
60
Y
Y
Y
Y
Y
Y
Y
Y
80
100
V
V
V
V
V
V
V
V
V
A
V
A
P
V
P
50
Y
Y
Y
Y
Y
Y
Y
Y
70
G
H
A
R
A
G
F
A
F
A
A
F
A
A
G
E
G
E
E
G
E
E
120
DNA Damage
Fibroblast Stimulation
B Cells Signaling
CMV Infection
Anoxia
Polio Infection
Monocytes Signaling IL4
Hormone
TAMO:
Tools for the Analysis of Motifs
Upstream Regions
expressed
CoGenes
Pho 5
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
Pho 8
CACATCGCATCACGTGACCAGT...GACATGGACGGC
Pho 81
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
Pho 84
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
Pho
CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC
Transcription
Start
Upstream Regions
Co-expressed
Genes
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT
Upstream Regions
Co-expressed
Genes
ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTA
TTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT
Pho4 binding
Understanding Metabolism
Understanding Disease
Inherited Diseases - OMIM
Infectious Diseases
Pathogenic Bacteria
Viruses