Professional Documents
Culture Documents
On WebCT
-- “The $1000 genome”
-- review of new sequencing techniques by George Church
Why sequence DNA?
MC chapter 12
Methods of sequencing
5’ 3’
Direction of
b) Extend the primer DNA
with DNA polymerase
polymerase in the travel
presence of all four
3’
dNTPs, with a
limited amount of a
dideoxy NTP
(ddNTP)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
3’ T TT T 5’
5’ 3’
ddATP in the reaction:
ddA
anywhere there’s a T in
ddA the template strand,
occasionally a ddA will
ddA
be added to the
ddA growing strand
How to visualize DNA fragments?
• Radioactivity
– Radiolabeled primers (kinase with 32P)
– Radiolabelled dNTPs (gamma 35S or 32P)
• Fluorescence
– ddNTPs chemically synthesized to contain fluors
– Each ddNTP fluoresces at a different wavelength
allowing identification
Analysis of sequencing products:
Click on:
“manipulation”
“techniques”
“sorting and sequencing”
An automated sequencer
The output
Current trends in sequencing:
It is rare for labs to do their own sequencing:
--costly, perishable reagents
--time consuming
--success rate varies
Assemble sequences by
~160 kbp matching overlaps
BAC sequence
~1 kbp
Amplify the DNA on each bead to cover each bead to boost the signal
A B
The readout is recorded by a
detector that measures position
of light flashes and intensity of
light flashes
A B
25 million bases in
about 4 hours
From www.454.com
APS = Adenosine phosphosulfate
Height of peak indicates the number of
dNTPs added
On WebCT
-- “The $1000 genome”
-- review of new sequencing techniques by George Church
Introduction to bioinformatics
1) Making biological sense of DNA
sequences
2) Online databases: a brief survey
3) Database in depth: NCBI
4) What is BLAST?
5) Using BLAST for sequence analysis
6) “Biology workbench”, etc.
www.ncbi.nlm.nih.gov
www.tigr.org
http://workbench.sdsc.edu
There’s plenty of DNA to make sense of
http://www.genomesonline.org/
(2006)
Making sense of genome sequences:
1) Genes
a) Protein-coding
• Where are the open reading frames?
• What are the ORFs most similar to? (What is the
function/structure/evolution history?)
b) RNA
2) Non-genes
Computer calls
GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCAC
CACCACCACCCCATGGGTATGAATAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAG
GGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACATAATTGATATGGGTCCTAAGTGGCGTGCTTTTG
ATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTCTTCATGACAAGGGGCTTTCAACTGCA
ATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGT
TAGTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGAC
ATGTAGAGGAAGAAGCTGCAANGCTGNACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTT
ATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGC
atgttgtatttgtctgaagaaaataaatccgtatccactccttgccctcctg
ataagattatctttgatgcagagaggggggagtacatttgctctgaaact
ggagaagttttagaagataaaattatagatcaagggccagagtggagg
gccttcacgccagaggagaaagaaaagagaagcagagttggagggc
ctttaaacaatactattcacgataggggtttatccactcttatagactggaa
agataaggatgctatgggaagaactttagaccctaagagaagacttga
ggcattgagatggagaaagtggcaaattaga
= ATG
= stop
codon
enzymes
Non-enzymes
DNA sequence
blastn--compares a nucleotide query sequence against a nucleotide sequence
database
Three scores
1) percent identity
2) similarity score
3) E-value--probability that two sequences will have the
similarity they have by chance (lower number, higher probability
of evolutionary homology, higher probability of similar function)
What is the E-value?
The E value represents the chance that the similarity is random
and therefore insignificant. Essentially, the E value describes the
random background noise that exists for matches between
sequences. For example, an E value of 1 assigned to a hit can be
interpreted as meaning that in a database of the current size one
might expect to see 1 match with a similar score simply by
chance.
Multi-domain proteins
Computer calls
GNNTNNTGTGNCGGATACAATTCCCCTCTAGAAATAATTTTGTTTAACTTTAAGAAGGAGATATACATATGCACCACCAC
CACCACCACCCCATGGGTATGAATAAGCAAAAGGTTTGTCCTGCTTGTGAATCTGCGGAACTTATTTATGATCCAGAAAG
GGGGGAAATAGTCTGTGCCAAGTGCGGTTATGTAATAGAAGAGAACATAATTGATATGGGTCCTAAGTGGCGTGCTTTTG
ATGCTTCTCAAAGGGAACGCAGGTCTAGAACTGGTGCACCAGAAAGTATTCTTCTTCATGACAAGGGGCTTTCAACTGCA
ATTGGAATTGACAGATCGCTTTCCGGATTAATGAGAGAGAAGATGTACCGTTTGAGGAAGTGGCANTCCANATTANGAGT
TAGTGATGCAGCANANAGGAACCTAGCTTTTGCCCTAAGTGAGTTGGATAGAATTNCTGCTCAGTTAAAACTTCCNNGAC
ATGTAGAGGAAGAAGCTGCAANGCTGNACANAGANGCAGNGNGANAGGGACTTATTNGANGCAGATCTATTGAGAGCGTT
ATGGCGGCANGTGTTTACCCTGCTTGTAGGTTATTAAAAGNTCCCGGGACTCTGGATGAGATTGCTGATATTGCTAGAGC
Find the open reading frame(s)
Translate it:
MKCPYCKSRDLVYDRQHGEVFCKKCGSILATNLVDSEL
SRKTKTNDIPRYTKRIGEFTREKIYRLRKWQKKISSERNL
VLAMSELRRLSGMLKLPKYVEEEAAYLYREAAKRGLTR
RIPIETTVAACIYATCRLFKVPRTLNEIASYSKTEKKEIMK
AFRVIVRNLNLTPKMLLARPTDYVDKFADELELSERVRR
RTVDILRRANEEGITSGKNPLSLVAAALYIASLLEGERRS
QKEIARVTGVSEMTVRNRYKELA
BLAST against (go to genomes page):
-- Microbial genomes
-- environmental sequences (genomes)
Results:
2) Large percentages of
coding proteins cannot be
assigned function based on
homology
For a current list of databases and bioinformatics tools
see: Nucleic Acids Research annual bioinformatics issue
(comes out every January).
http://www.oxfordjournals.org/nar/database/cap/
RNA
“proteome”
protein
The value of DNA microarrays for
studying gene expression
1) Study all transcripts at same time
The hybridization
represents the
measurement
A print head for generating arrays of
probes
Printing needles
QuickTime™ and a QuickTime™ anddecompressor
a
TIFF (Uncompressed) decompressorTIFF (Uncompressed)
are needed to see this picture. are needed to see this picture.
A yeast array experiment
vegetative sporulating
Isolate mRNA
Prepare fluorescently
labeled cDNA with two
different-colored fluors
hybridize read-out
Example microarray data
Green: mRNA
more abundant in
vegetative cells
Yellow: equivalent
mRNA abundance in
vegetative and
sporulating cells
(phase in which
gene is expressed)
High mRNA
levels
low mRNA
levels
MIAME:
The Minimum Information About a Microarray Experiment
RNA
“proteome”
protein
Analysis of the proteome: “proteomics”
• Which proteins are present and when?
• What are the proteins doing?
– What interacts with what?
• Protein-DNA interactions (chromatin
immunoprecipitation)
• Protein-protein interactions
– Functions of proteins?
4 10
From J.R. Yates 1998 “Mass spectrometry and the age of the
proteome” J Mass Spec. 33, p 1-19
Defining protein function
• Classical methods:
– Define activity of protein, develop an assay for activity
• Biochemistry: use assay to purify protein from cell,
characterize structure/function of protein in vitro
• Genetics: obtain mutants with change in activity,
characterize phenotype of mutant, obtain suppressors
to identify genes that interact with protein of interest
– Time intensive, expensive
Protein activity at the proteome level
• Protein-DNA interactions: identifying binding sites
for DNA-binding proteins: regulation of gene
expression
4) Determine presence of
DNA by quantitative PCR
V. Orlando (2000) TIBS 25, p. 99
Massively
parallel Ch-
IP
Proteins immobilized,
usually by virtue of a tag
sequence (6 x his tag, biotin,
etc.)
RNA
“proteome”
protein