You are on page 1of 73

PROTEOMICS

LECTURE
The “omics” nomenclature…
Genomics DNA (Gene)
Transcription

Transcriptomics RNA

Functional Translation
Genomics
Proteomics PROTEIN
Enzymatic
reaction
Metabolomics METABOLITE
A few definitions…

Gen Genes
Transcript ~ome = Sequence of a Transcripts
Prote complete set of Proteins
Metabol Metabolites

Gen ~omics = Analysis of the Genome


Prote Proteome
Why study protein expression?
(The steps of gene expression control)
(Gygi et al., Mol. Celll. Biol., 1990, p.1720-1730)
Nucleus Cytosol

Inactive mRNA
RNA
Degradation
control

Primary
DNA RNA mRNA mRNA
transcript
Translation control
RNA
protein Modified
RNA Transport
protein
Transcriptional Processing control
control control Post-translational
control
Applications of Proteomics
• Mining: identification of proteins (catalog
the proteins)
• Protein-expression profile: identification of
proteins in a particular state of the
organism
• Protein-network mapping: protein
interactions in living systems
• Mapping of protein modifications: how and
where proteins are modified.
Proteins classes for Analysis
• Membrane
• Soluble proteins
• Nuclear
• Chromosome-associated
• Phosphorylated
• Glycosylated
• Complexes
Proteomics and genomics are inter-dependent

Genome Sequence Proteomics


Genomics
mRNA Protein Fractionation

Primary Protein products


2-D Electrophoresis
Proteomics
Functional protein products
Protein Post-Translational
Identification Modification
Determination of gene
Aims of Proteomics
• Detect the different proteins expressed by
tissue, cell culture, or organism using 2-
Dimensional Gel Electrophoresis
• Store those information in a database
• Compare expression profiles between a healthy
cell vs. a diseased cell
• The data comparison can then be used for
testing and rational drug design.
Gel Electrophoresis
• Motion of charged molecules in an electric field.
• Polyacrylamide gel provides a porous matrix
– (PAGE – Polyacrylamide Gel Electrophoresis)
• Sample is stained with comassie blue to make it
visible in the gel.
• Sample placed in wells on the gel.
1-D Gel electrophoresis

• Separation in only
1 dimension: size.
• Smaller molecules
travel further
through the gel
then large
molecules, thus
separation.
1-D continued
• Electric field across gel separates molecules.
– Negatively charged molecules travel towards the
positive terminal and vice-versa.
– Western blotting(Protein) not to be confused with Southern
blotting (DNA) or Northern blotting (RNA)
• Proteins are treated with the denaturing detergent SDS
(sodium dodecyl sulfate) which coats the protein with
negative charges, hence SDS-PAGE.
2-D – Separation is based on
size and charge
• First step is to separate based on charge or
isoelectric point, called isoelectric focusing.
• Then separate based on size (SDS-PAGE).
Isoelectric Focusing
• The isoelectric point is the pH at which the net
charge of the protein molecule is neutral.
• Different proteins have different isoelectric
points.
• Isoelectric point is found by drawing the sample
through a stable pH gradient.
• The range of the gradient determines the
resolution of the separation.
SDS-PAGE
• Second Dimension.
• Separation by size.
• Run perpendicular to Isoelectric focusing.
• The only unresolved proteins after the first and
second dimensions are those proteins with the
same size and same charge – rare!
2-D Proteomics Example
2D-PAGE Analysis Software
• 2D-PAGE technology has been in use for over 20 years,
and potentially provides a vast amount of information
about a protein sample.
• However, due to difficulties with data analysis, it remains
only partially exploited.
Analysis problems

• It can be very difficult to compare the results of two


experiments to yield a differential expression profile:
• Can be severe warping of gel due to
– uneven coolant flow
– voltage leaks
– tears in gel
• Can be problems with normalisation of
– background
– spot intensity
• Can be differences in sample preparations.
Current state of software
• Correct identification and alignment of spots from the two
gels has generally been a process with a lot of manual
intervention - hence very slow.
• The processing power available with today’s PCs means
that automated analysis is starting to become possible.
• One vendor claims a throughput of 4 gel pairs per hour
can be compared and annotated by an experienced user
of their package.
Automated gel matching
• Gel matching, or “registration”, is the process of
aligning two images to compensate for warp.
• Some packages still require the user to identify
corresponding spots to help with gel matching.
• The Z3 program from Compugen has a fully-
automated gel matching algorithm:
– define set of small, unique rectangles.
– compute optimal local transformations for rectangles.
– Interpolate to make smooth global transformation.
• Note that this makes use of spot shape, streaks,
smears and background structure, which other
programs discard.
Spot detection
• Once the gel images have
been matched, the program
automatically detects spots.
Algorithms are generally
based on Gaussian
statistics.
Spot Quantitation
• The positions of detected spots are calibrated to give
a pI / mW pair for each protein.
• A value for the expression level of the protein can be
calculated from the overall spot intensity.
• Some programs do not quantitate each gel separately,
but calculate relative intensity pixel by pixel. This may be
a more accurate approach.
Differential Expression
• The user can set threshold
values for the detection of
differential expression. This
helps reduce the amount of
information displayed at
once.
• In this example, a protein
expressed only in the
second sample is circled in
red. The yellow circles show
proteins which are
differentially expressed.
Annotation
• Some systems allow semi-automatic annotation of spots,
based on a database of proteins listing their pI / mW
values.
• Proteins of interest can also be excised from the gel and
sent on to mass spectrometry for definitive identification.
The ProteomeWorks system from Bio-rad offers such an
integrated solution for 2D-PAGE and MALDI.
Multi-experiment Analysis
• One useful feature of modern programs is the ability to
collate data from many runs of the same experiment.
• Spots which only appear in one gel are likely to be
artifacts, and are removed from the analysis.
• This is an excellent way to reduce noise and enhance
weak signals.
Links
• Z3 system (Compugen) - http://www.2dgels.com/
• Melanie3 (SIB) - http://us.expasy.org/melanie/
• ProteomWeaver (Definiens) -
http://www.proteomweaver.com/
• PDQuest (Bio-Rad) - http://www.biorad.com/
• Delta2d (Decodon) - http://www.decodon.com/
Introduction to the databases

•With the advent of many 2-D PAGE databases there are a number of
protein spots that are already "identified" in a few cell lines. Combined
with the aims of the experiment, these databases may give one the
opportunity to guess at the identity of a particular protein spot and confirm
or deny this by immunoblotting. The approach of obtaining accurate
peptide masses from specifically cleaved proteins to search protein
sequence databases, known as peptide mass fingerprinting, provides one
with another opportunity to identify a previously sequenced protein or
(hopefully) confirm that it is indeed novel.

An animated SDS PAGE presentation


•A number of 2-D Gel databases exist.
•Quantitative databases: S.cervisiae and
REF52.
•Annotative databases: E.coli and human
keratinocytes.
•An annual issue of the journal
“Electrophoresis”-Major database for these
databases!!!…(I mean has links to many
of these).
•A best one would obviously the database
which is regularly updated.(Eg: Swiss 2D
page).
List of 2-D GEL DATABASES
One can find an extensive list of such databases by following these links.
We would discuss a few “Interesting ones”.
•World 2-D PAGE
•NCIFCRF
•DEAMBULUM-Protein Databases
•Ludwig Institute of Cancer Research
•Phoretix
World 2-D Page:Index of 2-D page Databases-ExPaSy
•Basically a link to various 2-D Page databases.
•Has a useful tool called 2-D Hunt where one could search for 2-DE
related sites on the web.
•Indexed as databases for multi species, mammalia, yeast,
plant,bacteria,viruses and parasites, cell lines.
Swiss 2-D Page
•Basically a protein databank for 2-D page and SDS page
reference maps.
•May give the exact location of the protein in the map or the
region in the map assuming the fact that it has a Swissprot
entry.
•Options: Search by keywords, Accession number, spot
clicking,full text,author,Swiss-2D Page spot serial
number,SRS.Most of them being self-explanatory.
•Protein list for a particular reference map(table)(can be
downloaded).It gives details on the gene name,protein
description,S-2DP reference number,S-2DP accession
number,identification method,Exp. Molecular weights and Pis for
each entity found.
•We can also locate the location of a protein sequence in
all/one/selected reference maps available.If it is not found a
temporary virtual entry is created on the ExPASy server.
SWISS 2D-PAGE (contd)
•It gives cross reference to Medline and a few other databases.
•In addition to this textual data, SWISS-2DPAGE provides several 2-D
PAGE images showing the experimentally determined location of the
protein, as well as a theoretical region computed from the sequence
protein, indicating where the protein might be found in the gel.

•Genbio (Geneva Bioinformatics) gives subscription(PAID) for the Swiss


2D PAGE to Commercial Institutions.
•Vital Statistics
¯ Current release(15.0) has 861 entries in 33 reference maps.
¯Vital stats continued...
¯Sources of reference maps:
¯Human( Liver, plasma, HepG2, RBC, Lymphoma, HepG2 Secreted
Proteins, CBF, Macrophage like Cell Line, Erythroleukemia cell,
platelet, kidney, promycelocytic leukemia cells, colorectal epithelia
cells, colorectal adenocarcinoma cell line(DL-1), Soluble nuclear
proteins and matrix from liver tissue)
¯Mouse( Liver, gastrocnemius muscle, pancreatic islet cells,brown
adipose tissue, white adipose tissue,soluble nuclear proteins,
matrix from liver tissue).
¯Arabidopsis thaliana
¯Dictyostelium discoideum
¯Escherichia coli(for 7 pI ranges: 3.5-10,4-5,4.5-5.5,5-6,5.5-6.7,6-
9,6-11)
¯Saccharomyces cerevisiae
Swiss 2D Page(cont..)
There have been some recent additions to the
database.
SDS and 2-D Page of nuclear proteins from Human
HeLa cells have been added to the growing list of
reference maps.It is still an ongoing
project.Information about known proteins found
within that gel stretch has been mapped(see beloe:
right-SDS, left-PAGE)
Swiss 2D Page(cont)
Some Useful abbreviations:
-ID line: comprises of ID, Entry name,Entry class and the
method(2Dgel) in the order as mentioned.They follow a specific
nomenclature.
-AC line basically contains Accession numbers seperated by a semi
colon.It’s a stable way of identifying entries with each release.
-DT Line specifies date( self explanatory!).
-DE Line gives a descriptive information about the protein.If the
complete sequence was not determined then last line would spell as
“Fragment”.
Some useful abbreviations(cont…)
-The IM line
The IM (Images) lines list the 2-D PAGE and SDS-PAGE images
which are associated to the entry. These may be, for example,
TUMORAL LIVER, NORMAL LIVER or just LIVER.
-RP(Reference Position) line: Describes extent of work carried out
by the author.Eg: Protein sequence, amino acid composition,
mapping on gel, characterisation and review.
-The “O” series contains organism species(OS),taxonomy(OX) and
classification(OC).
-MT(Master) line has information about types of maps used(Eg:
Plasma, liver etc).
Methods used for zeroing on the identified spots.
•Total of 3398 identified spots(as of the latest version).
•Amino acid composition has identified 5.3% of these spots.
•Co-migration: 2.6%
•Gel-matching: 46.7%
•Immunoblotting: 20%
•Microsequencing: 15.5%
Well..does it carry
•Peptide mass fingerprinting: 26.3% a message?
•Tandem mass spectroscopy: 2.3%
Browsing the Swiss 2-D Page using spot clicking
-We could get information about a known protein by clicking on one of the
“checks” in the extensive list of image maps available.
-On clicking it throws a tailor-ready image map showing the
accurate/approximate position of that protein with respect to all the image
maps available.But for obvious reasons the best view can be obtained from
the reference image map we initially clicked.
-A hypertext link can then be used to obtain the full SWISS-PROT entry for
that protein, displaying protein sequence, domain structure, information on
known post-translational processing and modifications, and references.
Image clicking(continued…)
-From SWISS-PROT, the user can select a link to SWISS-3DIMAGE
to see the three-dimensional structure of the protein, if it is known,
or to submit the sequence to the SWISS-MODEL three-dimensional
modelling tool or view the domain structure .
- Also, from SWISS-PROT, the user can select links to pertinent
information from DNA sequence databases (EMBL/Genbank),
chromosomal and genomic maps (GDB Genome Database),
bibliographic references and abstracts (Medline), and databases on
the association of human proteins with diseases (OMIM Online
Mendelian Inheritance in Man).
Diagramatic representation of Image Clicking...
Here we click on this
spot in reference map
of the Colorectal
epithelia cell

Throws a screen
showing
the pictures of
different image
maps with respect
to that protein
Diagramatic representation(cont…)

Protein
identification
on chosen reference map The red rectangle
is the expected
region of the
protein on the gel.

Spots are the


proteins identified

Dotted lines are extensions of the


possible regions if the protein is
acetylated, phosphorylated or
glycosylated.
Biobase(cont…)

Search Options:
Seacrh by protein name, keyword, sample spot number,
Relative Molecular mass, pI, organelle /component.
Other options relating listing of proteins,views of the gels are quite self
explanatory.
Other utilities of the Database:
Has links to
-NCBI’S Human-Mouse Homology maps through its Mouse 2D-PAGE
Databases.
-Interesting studies like Mouse-Genome Informatics(Jackson’s lab) and Mouse
Atlas Projects.
NCIFCRF(National Cancer Institute…..could not sphere
out what FCRF was!!!..sorry)

*Seems a very exhaustive and useful source.Lots of things still to


study.*

2D Protein Gel Databases

Maintained by
Image Processing
Section
WebGel Flicker dbEngine
Maintain the
gel analysis
software-
GELLAB II
WebGel:
WebGel is an Internet-based, interactive, qualitative and
quantitative gel
database analysis system.
A WebGel database contains previously quantified gel data
generated from a
stand-alone quantitataive gel analysis system.

wbdemoDB melanie2DB
demonstration demonstration fasDB
database database database
of serum of E.coli gelsfrom the of serum proteins
proteins in a Melanie 2.3 in a fetal alcohol
fetal alcohol demonstration syndrome study
syndrome study. database.
SEPARATION
General
flow for
proteomics
IDENTIFICATION
analysis
Current Proteomics Technologies
• Proteome profiling/separation
– 2D SDS PAGE (two-dimensional sodium
dodecylsulphate polyacrylamide gel electrophoresis)
– 2-D LC/LC
(LC = Liquid Chromatography)
– 2-D LC/MS
(MS= Mass spectrometry)
• Protein identification
– Peptide mass fingerprint
– Tandem Mass Spectrometry (MS/MS)
• Quantative proteomics
- ICAT (isotope-coded affinity tag)
2D-SDS
PAGE gel
1) Sample loading 2) Remove the cover
sheet from the IEF gel

3)Place the strip gel 4) Place the strip on the


in the focusing tray top of the SDS-PAGE gel
2D-SDS PAGE gel
The first dimension
(separation by isoelectric focusing)
- gel with an immobilised pH gradient
- electric current causes charged
proteins to move until it reaches the
isoelectric point
(pH gradient makes the net charge 0)
Isoelectric point (pI)
• Separation by charge:
Low pH:
4 Protein is
positively At the isolectric
Stable pH gradient

5 point the protein


charged
6 has no net
charge and
7 therefore no
8 High pH: longer migrates
protein is in the electric
9 field.
negatively
10 charged
2D-SDS PAGE gel
The first dimension
(separation by isoelectric focusing)
- gel with an immobilised pH gradient
- electric current causes charged
proteins to move until it reaches the
isoelectric point
(pH gradient makes the net charge 0)

The second dimension


(separation by mass)
-pH gel strip is loaded onto a SDS gel
-SDS denatures and linearises the
protein (to make movement solely
dependent on mass, not shape)
2D-SDS PAGE gel
2D-gel technique example
Advantages vs. Disadvantages
• Good resolution • Not for
of proteins hydrophobic
• Detection of proteins
posttranslational • Limited by pH
modifications range
• Not easy for low
abundant proteins
• Analysis and
quantification are
difficult
2D - LC/LC
Peptides all bind
(trypsin) to cation
Study protein exchange column
complexes
Successive elution
without gel with increasing salt
electrophoresis gradients separates
peptides by charge

Peptides are
separated by
Complex mixture is hydrophobicity on
reverse phase
simplified prior to column
MS/MS by 2D LC
Reverse Phase column

Polypeptides enter the column in the mobile phase…


…the hydrophobic “foot” of the polypeptides adsorb to the
hydrophobic (non polar) surface of the reverse-phase
material (stationary phase) where they remain until…
…the organic modifier concentration rises to critical
concentration and desorbs the polypeptides
2D - LC/MS
Methods for
protein
identification
Mass Spectrometry (MS) Stages
• Introduce sample to the instrument
• Generate ions in the gas phase
• Separate ions on the basis of differences in m/z
with a mass analyzer
• Detect ions
How the protein sequencing works?
• Use Tandem MS: two mass
analyzer in series with a collision Ser-Glu-Leu-Ile-Arg-Trp
cell in between
• Collision cell: a region where
the ions collide with a gas (He,
Ne, Ar) resulting in Collision Cell
fragmentation of the ion
• Fragmentation of the peptides in Ser-Glu-Leu-Ile-Arg
the collision cell occur in a
predictable fashion, mainly at the Ser-Glu-Leu-Ile
peptide bonds (also
phosphoester bonds) Ser-Glu-Leu
• The resulting daughter ions have
masses that are consistent with Etc…
known molecular weights of
dipeptides, tripeptides,
tetrapeptides…
Tandem Mass Spectrometry
Isolates individual peptide fragments for
(trypsin) 2nd mass spec – can obtain peptide
sequence

Compare peptide sequence


with protein databases
Advantages vs. Disadvantages
• Determination • High capital costs
of MW and aa. • Requires sequence
Sequence databases for
• Detection of analysis
posttranslational
modifications
• High-throughput
capability
Protein identification by Peptide
Mass fingerprint

• Use MS to measure the masses of


proteolytic peptide fragments.
• Identification is done by matching the
measured peptide masses to
corresponding peptide masses from
protein or nucleotide sequence databases.
Mass spectometry (MS)

(trypsin)
Mass spectrometry – method of separating
molecules based on mass/charge ratio

eg. MALDI-TOF
Compare peptide m/z
with protein databases
Protein Identification by MS
Spectrum of
Spot removed Fragmented
fragments
from gel using trypsin
Library generated

MATCH

Database of
Artificial Artificially
sequences
spectra built trypsinated
(i.e. SwissProt)
ISOTOPE-CODED AFFINITY TAG
(ICAT): a quantitative method
• Label protein samples with heavy and light
reagent
• Reagent contains affinity tag and heavy or light
isotopes
Chemically reactive group: forms a
covalent bond to the protein or peptide

Isotope-labeled linker: heavy or light,


depending on which isotope is used

Affinity tag: enables the protein or


peptide bearing an ICAT to be isolated by
affinity chromatography in a single step
Example of an ICAT Reagent
Biotin Affinity tag: Reactive group: Thiol-
Binds tightly to reactive group will bind to Cys
streptavidin-agarose
resin
O
Linker: Heavy version will
NH have deuteriums at *
NH Light version will have
hydrogens at *
H H
N * O O * N
*
I
* O
S O
O
How ICAT works?
Affinity isolation
on streptavidin
beads

Lyse & Quantification Identification


Label MS MS/MS

NH2-EACDPLR-COOH
Light
100
100
MIX Heavy

Proteolysis
(eg trypsin)
0 0
550 570 590 200 400 600
m/z m/z
Advantages vs. Disadvantages
• Estimates relative • Yield and non specificity
protein levels between • Slight chromatography
samples with a differences
reasonable level of • Expensive
accuracy (within 10%)
• Tag fragmentation
• Can be used on
complex mixtures of • Meaning of relative
proteins quantification information
• Cys-specific label • No presence of cysteine
reduces sample residues or not accessible
complexity by ICAT reagent
• Peptides can be
sequenced directly if
tandem MS-MS is used

You might also like