You are on page 1of 36

Medium and Large Scale Correlations

of RNA and Protein Expression


Information in Yeast

Lecture 7
Tuesday, Jan. 27, 2015

Part 1:
C1
Correlation between protein and
mRNA abundance in yeast.
Gygi SP, Rochon Y, Franza BR, & Aebersold R.

Mol. Cell Biol., 1999, 19(3), 1720-30

Large Scale Correlation of Protein


and mRNA Levels in Yeast
Gygi SP, Rochon Y, Franza BR, Aebersold R.
Correlation between protein and mRNA abundance in yeast.
Mol. Cell Biol. 1999 Mar;19(3):1720-30

Protein separation by 2-Dimensional Gels.


Protein identification by tandem Mass
Spectrometry (MS/MS).
Protein quantification by 2D Gel (35S-Met).
Prediction of Protein Abundance: Codon Bias
mRNA quantification by SAGE technology.

RNA quantitation methodology (circa 1999)


experiment

Microarray
s1
Affymetrix2

ORF

control

R/G ratios
R, G values
quality indicators

ORF

Averaged PM-MM
presence

PM
MM

feature statistics
25-mers

ORF SAGE Tag

SAGE3
concatamers

Counts of SAGE
14-mers sequence
tags for each ORF

DeRisi, et.al., Science 278:680-686 (1997); 2 Lockhart, et.al., Nat Biotech 14:1675-1680 (1996)
3
Velculescu, et.al, Serial Analysis of Gene Expression, Science 270:484-487 (1995)
1

Protein quantification methodology:


2-D gels with densitometry (35S-Met)

2D Silver-Stained Gel of the Proteins in Yeast Cell Lysate

All predicted Yeast Proteins

Yeast Proteins observed in study

Codon
Bias

Predicted stability (by Nend rule) of proteins


observed in study

Small Vignette on Codon Bias:


Codon bias data is calculated by the method of
Bennetzen and Hall (1982) . The maximum value
of codon bias (CB) is 1.0, and the minimum value
can be negative. Higher value can be negative.
Higher values indicate greater degree of gene
expression.
Bennetzen, J.L. & Hall, B.D. Codon selection in
yeast. J. Biol. Chem. 1982 Mar 25;257(6):3026-31

Codon usage in a reference set of genes (that are known to


be highly expressed, i.e. ribosomal proteins, etc.) allows us
to assign a coefficient to every codon based on its usage
frequencies
For any gene, its CAI score is the geometric mean of the
coefficients for all codons used in the gene
CAI score of 1:
The gene uses the same codons as the reference set of genes that
we know are highly expressed

CAI score of 0:
The gene uses different codons from the reference set of genes

http://phage.cisat.jmu.edu/bioinformatics/wp-content/uploads/2011/04/codon_table2.jpg

Sweetened carbonated beverage


Soda
Pop
Coke
Soft drink
Percent of codons genome-wide
(Escherichia coli)

Proportion of Glycine codons


genome-wide

The codon adaptation index (CAI)


model can predict expression levels of
native E. coli proteins
Protein abundances

R**2 = 0.38

CAI score

Does codon usage bias vary with position along


the length of a gene? (in E. coli)

Codon position

Does codon usage bias vary with position along


the length of a gene? (in E. coli)

57% global usage

43% global usage

Codon position

Codon Usage in S. cerevisiae (Back to Ref. C1!)

Correlation between Protein and mRNA Levels

Correlation of Abund. Correlation with


Increasing Protein Abundance

Correlation of mRNA and Protein


Expression Levels
Protein Copy numbers ranged from 2200 863,000
RNA copy numbers ranged from 0.7-473
The Pearson(CC) product for 106 genes
was 0.935.
73 genes (69%) were below 10 copies/cell
(RNA), and their Pearson product is 0.356.

Part 2:
C2
Protein pathway and complex
clustering of correlated mRNA and
protein expression analyses in
Saccharomyces cerevisiae
Washburn, et al..

PNAS, 2003, 100, 3107-3112


19

Abstract

The mRNA and protein expression in Saccharomyces cerevisiae cultured in


rich or minimal media was analyzed by oligonucleotide arrays and
quantitative multidimensional protein identification technology. The overall
correlation between mRNA and protein expression was weakly positive
with a Spearman rank correlation coefficient of 0.45 for 678 loci. To place
the data sets in a proper biological context, a clustering approach based on
protein pathways and protein complexes was implemented. Protein
expression levels were transcriptionally controlled for not only single loci
but for entire protein pathways (e.g., Met, Arg, and Leu biosynthetic
pathways). In contrast, the protein expression of loci in several protein
complexes (e.g., SPT, COPI, and ribosome) was posttranscriptionally
controlled. The coupling of the methods described provided insight into the
biology of S. cerevisiae and a clustering strategy by which future studies
should be based.

Quantitative Analysis of Protein Expression


cDNA and oligonucleotide arrays reveal the relative
changes in mRNA abundance-an indirect measure.
Complex post-transcriptional regulatory mechanisms
have the potential to uncouple mRNA and protein
expression levels.
Quantitative proteomics allows for the direct analysis
of protein expression changes.
Several methods: ICAT, digestion in 18O water,
SILAC, metabolic labeling (15N vs. 14N), etc.

Cell growth
condition A

Cell growth
condition B

Light Label Sample-

Heavy Label
Sample

ProteinX from A

Quantitative
Proteomics

50/50 Mix of heavy and light


labeled proteins
ProteinX from A

ProteinX from B

Digest mixture and


resolve peptides
Analyze by
mass spectrometry

Parent ion
intensity

Quantitative Proteomic
Methods have different
entry points into this scheme
(We will cover this in detail
in future lectures!)

ProteinX from B

PeptideX from
ProteinX from B
PeptideX from
ProteinX from A

Mass/charge

Correlated mRNA and Protein Expression


Analyses in S. cerevisiae
Grow S. cerevisiae in 15N minimal media and rich
media (14N)
Control is 15N enriched minimal media vs. normal (14N)
minimal media

Prepare protein extracts and mix equal amounts of 15N


and 14N labeled proteins
Two samples: Min. vs. Rich and Min. vs. Min.

Reproducibly detected, identified, and quantified the


protein expression levels of 678 loci.
How do the data correlate to mRNA?
Wodicka et al. Nat. Biotech, 15, 1359-1367.
They also ran our own oligo arrays of same dataset

Spearman Rank Correlation Analysis

Full dataset Sr = 0.45

The major challenge of large coupled datasets is assembling the data into a biological
context.
Proteins function in pathways and complexes
Clustered loci by protein pathways and complexes according to YPD and MIPS

MET6: homocysteine methyltransferase: in


methionine synthesis (CAI = 0.658)
Mike1224001 #1948-2016 RT: 52.25-54.14 AV: 18 NL: 2.08E8
T: + c ESI Full ms [ 400.00-1400.00]
724.7
100
95

MET6

90
731.6

85
80

968.6

75
803.6

70

Relative Abundance

65
60
961.6
937.4

55
50

1279.6

45

1291.5

646.4

40

761.5

1266.6

35
30

574.6

887.6

717.7

481.5

25

969.5

603.4

1067.4

886.5

20
15

476.1

10

467.6

703.4

536.6

1292.6

1147.6

828.0

1241.6

1359.7

5
0
400

500

600

700

800

900
m/z

1000

1100

1200

1300

1400

-in three separate runs


-14 x protein increase in minimal media
-12 x on oligo array

Intracellular sulfate

RP

mRNA and Protein Expression Ratios


of Methionine Biosynthetic Pathway

MET3
5-adenylylsulfate
MET14

Expression
Rich/Min
0.09-0.05
0.12-0.1
0.16-0.125
0.24-0.17
0.49-0.25
2.0-0.5
2.0-4.0
4.1-6.0
6.1-8.0
8.1-10.0
10.1-20.0

3-phospho-5-adenylylsulfate
MET16
sulfite
ECM17
CYS3
MET10

cystathionine

cysteine

sulfide
MET17

CYS4

homocysteine
CH3

MET6
methionine
SAM1
SAM2

S-adenosylmethionine

5-methyltetrahydrofolate

5,10-methylenetetrahydrofolate
MET13

ARG1: arginiosuccinate synthetase (CAI = 0.395)


Mike1224003 #1333-1377 RT: 38.76-39.91 AV: 12 NL: 8.69E7
T: + c ESI Full ms [ 400.00-1400.00]
1331.1

100
95

888.0

90
85

1331.9

80
878.8

75
70

918.9

Relative Abundance

1317.1

ARG1

65
60

1359.8

55
50
45

1272.9

40

919.8

870.8

590.8

35

1179.6

30

583.9

15

581.2

10

568.1

0
400

1229.3

1099.2

780.7

20

947.6

786.7

666.5

25

1259.6

1062.6

492.6

500

659.4

600

773.2

700

800

900
m/z

1000

1100

1200

1300

1400

-in three separate runs


-9 x protein increase in minimal media
-8 x on oligo array

Protein Pathway mRNA and Protein


Expression

-In each case, the mRNA and protein


abundance of entire pathways correlated.

Protein Complexes Clustering


As with protein pathways, grouped loci based on
presence in protein complexes.
Based on biochemically defined complexes in literature
(YPD and MIPS)
The protein expression of multiple components in
several protein complexes were post-transcriptionally
regulated.
Interesting Point: What does not change is as important
as what does change.
This underscores the importance of comprehensive analyses

Protein Complex Expression


Protein Complex Name

Loci Characterized

mRNA Expression
(rich/min)

Protein Expression
(rich/min)

Cytochrome bc1

COR1, QCR2, 7

0.67 0.09

0.42 0.14

COPI

COP1, SEC27, SEC28

1.16 0.10

0.42 0.10

ARF1

1.14 0.17

2.11 0.26

ARF2

0.66 0.20

1.19 0.82

GEA1

0.79 0.39

0.33 0.23

GEA2

0.94 0.47

0.72 0.29

SYT1

1.25 0.55

0.89 0.22

SEC13

1.19 0.50

0.58 0.19

SEC23

1.33 0.45

0.46 0.23

SEC24

1.18 0.30

0.97 0.52

SAR1

0.81 0.40

0.96 0.10

VMA1, 4, 6, 13, VPH13

0.93 0.10

0.48 0.09

VMA2

1.01 0.10

1.72 0.44

VMA5

0.89 0.22

0.99 0.15

40S (N=28)

1.33 0.15

1.41 0.27

60S-1 (N=28)

1.31 0.15

1.37 0.37

60S-2 (N=8)

1.39 0.18

3.06 0.94

GTPases

GEFs

COPII

GTPase
Vacuolar H(+) ATPase

Ribosome

RPL5, 6B, 8A, 10, 13A/B, 21A/B, 31A/B, 33A/B

Expression Changes of RNA PolII Control


Proteins
Tup1p

Anc1p

Sin4p

Expression Ratios of RNA PolII Control


Proteins
Protein Complex
Name

Loci Characterized

mRNA Expression
(rich/min)

Protein Expression
(rich/min)

RNA Pol II
Holoenzyme

ANC1, SIN4, SSU72

0.94 0.26

0.30 0.10

SPT complex

SPT4

0.77 0.75

0.24 0.06

SPT5

1.17 1.19

0.58 0.32

SPT6

1.94 0.75

1.28 0.35

1.10 0.21

0.35 0.17

Histone
Modification

EAF3, SIN3, CPR1,


YIL112W

Repression

TUP1

1.10 0.27

8.39 0.53

SIN3 binding

STB2

0.23 1.8

24.0 3.6

-Part of mediator core RNA


pol II complex

-general transcriptional
repressor

Increased Diversity of mRNA


Overexpression in Minimal Media
From the oligo array analysis
Greater diversity of mRNAs are overexpressed in minimal
media vs. rich media
In an analysis by Wodicka et al. 140 mRNAs were >5x
overexpressed in minimal media while 36 mRNAs were >5x
overexpressed in rich media
In our analysis 80 mRNAs were >5x overexpressed in minimal
media vs. 36 mRNAs >5x overexpressed in rich media.

The proteomics dataset alone provided a possible


explanation for this observation
A decrease in repression and increase in RNA Pol II machinery
provides a mechanism for these observations

Correlated mRNA and Protein Expression


Analyses in S. cerevisiae
Correlated analyses require contextual
interpretations
focusing on pathways and complexes was
key
time dependent analyses will likely require
clustering algorithms, but dont forget the
context

Toward Integrated Omics:


Medium and Large Scale Correlations
of RNA and Protein Expression
Information in Yeast
Post-translational
DNA

RNA

Protein

Modification

?
?

Mutations,
Polymorphisms
(Epigenetics)

Lecture 7
Tuesday Feb. 4, 2014

TranscriptProteomics,
omics
PTMs

Phenotype

Correlation of Complex
Phenotypes to
Levels of Biopolymers
35
Across the Central Dogma

Exam #1 is on Feb. 10th!